A Scalable Solution for Personalized Recommendations in Large-scale Social Networks Sardianos Christos, Varlamis Iraklis Harokopio University of Athens Dept. of Informatics & Telematics {sardianos}{varlamis}@hua.gr Click icon to add picture HAROKOPIO UNIVERSITY of ATHENS Department of Informatics & Telematics PCI 2014, Athens October 2-4, 2014 18th Panhellenic Conference in Informatics
18
Embed
A Scalable Solution for Personalized Recommendations in Large-scale Social Networks
H AROKOPIO U NIVERSITY of A THENS Department of Informatics & Telematics. A Scalable Solution for Personalized Recommendations in Large-scale Social Networks. Sardianos Christos, Varlamis Iraklis Harokopio University of Athens Dept. of Informatics & Telematics - PowerPoint PPT Presentation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
A Scalable Solution for Personalized
Recommendations in Large-scale Social Networks
Sardianos Christos, Varlamis Iraklis
Harokopio University of AthensDept. of Informatics & Telematics
{sardianos}{varlamis}@hua.gr
Click icon to add picture
HAROKOPIO UNIVERSITY of ATHENS
Department of Informatics & Telematics
PCI 2014, Athens October 2-4, 2014 18th Panhellenic Conference in Informatics
In many Web 2.0 applications users can interact with the applications in terms of social activity. They can express their trust for another user or another user’s
review.
A recommender system is responsible for recommending items (e.g. products, articles etc.) to users, based on their previous activity.
This can be a difficult process, using existing techniques, in large social and bipartite graphs.
PCI 2014, Athens October 2-4, 2014 18th Panhellenic Conference in Informatics
Role of Recommender Systems
Structure of Recommender Systems
PCI 2014, Athens October 2-4, 2014 18th Panhellenic Conference in Informatics
We consider two types of entities:
• Users• Items
Users express their preferences for some of the available items by rating them (directly or indirectly).
These preferences usually are expressed in a user rating matrix or utility matrix.
System’s goal: Predict user’s preference for items that he hasn’t “rated” yet and recommend the k-most likely preferred.
Recommender Systems Approaches
There are many Recommender Systems approaches, which can be broadly categorized into the following categories.*
Collaborative Filtering (CF)
Content-based
Hybrid Systems
* P. Melville, V. Sindhwani. "Recommender Systems", Encyclopedia of Machine Learning, Springer, 2010.
PCI 2014, Athens October 2-4, 2014 18th Panhellenic Conference in Informatics
Limitations of Existing Approaches
Social networks like Facebook & Twitter have over 1.5BN & 95M users respectively. Thus, a major limitation for Recommender Systems is scalability.
The process of generating recommendations for users, for whom the system has insufficient information (Cold-Start users) is a known issue of Recommender Systems.
PCI 2014, Athens October 2-4, 2014 18th Panhellenic Conference in Informatics
Scientific Research Question-Definition
Is it possible to achieve equally good recommendations by applying CF over subgraphs of the original graph?
Is it possible to use these subgraphs for providing a solution for the Cold-Start problem?
Proposed Solution: The creation of subgraphs based on social information content.
PCI 2014, Athens October 2-4, 2014 18th Panhellenic Conference in Informatics
We performed a 5-fold Cross-Validation over the Training & Testing samples.
The range of the different number of subgraphs examined was:s = {1, 2, 4, 8, 16, 33, 65, 125, 250, 500, 1000}, using the whole neighborhood as k-nearest neighbors.
For s = {4, 65, 1000} we examined the performance of User-User algorithm for different Neighborhood–Size (knn), with k = {1, 3, 5, 10, 25, 50, 100, 500, 1000}.
The number of features used for training by FunkSVD algorithm was set to: FeatureCount = 100.
The number of Listsize for the Top-N nDCG metric was set to: Ν = 5.PCI 2014, Athens October 2-4, 2014 18th Panhellenic Conference in Informatics
Evaluation Findings
o Evaluation time is rapidly reduced, while number of subgraphs increases.o For s>16 (~7.530 users), Item-Item algorithm performs faster than User-User
και SVD.o Execution of Item-Item & User-User algorithms over the full graph was
impossible , while SVD algorithm could not be executed for s<4 (~30.123 users), due to memory insufficiency because of the way SVD algorithm works.
PCI 2014, Athens October 2-4, 2014 18th Panhellenic Conference in Informatics
PCI 2014, Athens October 2-4, 2014 18th Panhellenic Conference in Informatics
Evaluation Findings
o Algorithms SVD & Item-Item appear to have normalized gain , unlike User-User that performs poorly, due to the notable large number of items per subgraph.
o Algorithm Item-Item can predict similar items (based on the ratings), while SVD creates a smaller and denser item space. Better performance!
PCI 2014, Athens October 2-4, 2014 18th Panhellenic Conference in Informatics
Evaluation Findings
o Results are comparable to those from Epinions.o User-User algorithm still doesn’t perform well, but has more stable behavior.o There is however, a larger standard deviation of the performance of User-
User algorithm over each subgraph for the different values of s, unlike Item-Item & SVD algorithms.
Conclusions
Is it possible to create a model that will take into account the social network of the users for creating personalized recommendations in large-scale social networks?
In conclusion, we can say that the performance of the proposed model (CF in subgraphs) is comparable to that of the traditional techniques (CF in full graph).
In sparse bipartite graphs, the performance of this model may be reduced.
But, using algorithms such as SVD, we can provide a solution even in the case of sparse bipartite graphs.
The proposed approach could be utilized to implement a distributed recommender system, minimizing the execution time and producing high quality recommendations.
PCI 2014, Athens October 2-4, 2014 18th Panhellenic Conference in Informatics
PCI 2014, Athens October 2-4, 2014 18th Panhellenic Conference in Informatics
Future Work
Deploy the proposed model over a distributed architecture
Partitioning is fast, CF is the bottleneck
• Based on graph (and subgraph) statistics, decide whether to partition or not and decide on the number of partitions
Graph partitioning results in many CrossCluster edges, which are currently ignored
• What happens when we take these edges into account
Thank you for your time.
PCI 2014, Athens October 2-4, 2014 18th Panhellenic Conference in Informatics