This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Information filtering technique to locate products/services/information that is relevant and exciting to users based on historicalpreferences; utilizing “wisdom of the crowds”
E.g. Editorial, Aggregates (top views, top downloads, recent),Personalized recommendations
Formally :
U = set of UsersI = set of ItemsUtility function F: Relates U to I through a rating R ; E.g. 0-5 stars, a real numberTask : For each user, estimate her preference for items that are yet unseen by her, given all the existing user ratings.
Intuition• In many domains, considerable contextual data in text form is available,
describing items being recommended (e.g. movies, e-commerce)
• Standard CF algorithms do not consider latent properties of users/items whichmay be influencing a user’s rating decision on items
• Discovering such latent properties of users/items help to address sparsityproblem as similarity calculation is possible even if there aren’t anyoverlapping ratings among a pair of users
Approach Overview• Discover user profiles in a latent topic space by leveraging contextual data in
text form and user’s historical ratings.
• Build a hybrid neighborhood based on a similarity score considering latenttopic space similarity as well as rating overlap based similarity
• Recommend items yet unpicked by the user based on their popularity withinthe hybrid neighborhood
Load the matrix I determined by the item-topic distribution vectors corresponding to all item documents into memory
For each user U, lookup & load the list of items that she has expressed interest on, into a list L.
Initialize the current user U’s topic distribution vector to zeros.
For each item i in L (each item that current user has expressed interest on),
Add the topic distribution vector for i, multiplied by U’s rating normalized by sum of all ratings from U, into U’s topic-distribution vector
Persist U’s topic-distribution vector as her user profile.
Summary : Add up item-topic distributions multiplied by normalized user-rating, corresponding to each user’s interests, to generate each user’s topic-distribution vector, which indicates his user profile in the latent topic space
Standard Item Based CF performs the worst with precision values way less than even 1%
Standard User Based CF generates precision values less than 5%
Proposed HUNR performs the best with precision value at 5 to be more than 31%
Recall at 30 indicates HUNR is able to retrieve > 25% of relevant items where as standard User Based CF is only able to retrieve < 5% of the relevant items
F-measure analysis also ascertains that HUNR significantly outperforms standard CF techniques
Standard Item Based CF performs the worst with precision values way lessthan even 1%
Standard User Based CF is generating precision values around 10% whereas proposed HUNR performed the best with precision value at 5 to be >38%.
Recall at 75 indicates that HUNR is able to retrieve around 24% of the relevant items where as standard User Based CF is able to retrieve < 9% of the relevant items
F-measure analysis also indicates that HUNR performs much better compared to standard CF
We proposed a novel hybrid recommender approach usingLDA, utilizing similarity of users in a latent topic space alongwith rating overlap based similarity to refine neighborhoodformation for improving quality of recommendations.
Empirical evaluations indicate that the technique is well suitedfor recommender domains having contextual data available intext form, describing items being recommended
Proposed approach significantly outperform standard CFalgorithms which make use of rating data alone for generatingrecommendations.
[6] Y. Koren, “Factorization meets the neighborhood: a multifaceted collaborative filtering model,” in Proceedings of the
14th ACM SIGKDD international conference on Knowledge discovery and data mining.ACM, 2008, pp. 426–434.[7] P. Lops, M. de Gemmis, and G. Semeraro, “Content-based recommender systems: State of the art and trends,” in
Recommender Systems Handbook. Springer, 2011, pp. 73–105.
[8] D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent dirichlet allocation,” the Journal of machine Learning research, vol. 3, pp.
993–1022, 2003.
[9] Y. Zhang, A. Ahmed, V. Josifovski, and A. Smola, “Taxonomy discovery for personalized recommendation.”
[10] D. H. Stern, R. Herbrich, and T. Graepel, “Matchbox: large scale online bayesian recommendations,” in Proceedings of
the 18th international conference on World wide web. ACM, 2009, pp. 111–120.
[11] T. Dunning, “Accurate methods for the statistics of surprise and coincidence,” Computational linguistics, vol. 19, no. 1,
pp. 61–74, 1993.
[12] D. Lee, “Personalized recommendations based on usersinformation-centered social networks,” Ph.D. dissertation,
University of Pittsburgh,2013.
[13] IMDb., “Internet movie database:,” February 2014. [Online]. Available: http://www.imdb.com/interfaces
[14] B. Fritz., “The open movie database api:,” February 2014. [Online]. Available: http://www.omdbapi.com/
[15] J. Riedl and J. Konstan, “Movielens dataset,” 1998.
[16] A. K. McCallum, “Mallet: A machine learning for language toolkit,” 2002, http://mallet.cs.umass.edu.
[17] S. Owen, R. Anil, T. Dunning, and E. Friedman, Mahout in action. Manning, 2011.