Dr. Guandong Xu Intelligent Web and Information system Department of Computer Science Aalborg University The Research Progress of Recommender Systems in.

Dr. Guandong Xu Intelligent Web and Information system Department of Computer Science Aalborg University The Research Progress of Recommender Systems in Social Tagging Systems

Outline Why recommender systems The state-of-the-art of recommender systems Social tagging systems Tag-based recommender system Personalized recommendation Tag recommendation User profiling Open research questions Conclusion Appendix: Our recent work on group approaches

Why recommender systems The Internet computing era Information overload Low precision: retrieved info is not what you need Low recall: the correctly relevant info is not exhaustively returned Example:

Why recommender systems No personalized Different users returned the same search results Personalization or recommendation Same results

Why Recommender System GroupLens: An open architecture for collaborative filtering of netnews, Resnick, P.; Iacovou, N.; Sushak, M.; Bergstrom, P.; Riedl, J., 1994 ACM Conference on Computer Supported

Why Recommender Systems Recommender systems recommendation systems recommendation engines users recommend information items (films, television, video on demand, music, book, news, images, web pages, etc) information filtering system technique Interested in Content-based approach, Collaborative filtering approach Information

Tradition Recommendation Methodology CategoriesMethodology Content-based TF-IDF Bayesian classifiers ML(Clustering, decision tree, Artificial neural networks) Collaborative recommendation Memory-based Model-based (K-means clustering, Bayesian model, probability relational model, liner regression) Hybrid Adding Content-Based Characteristics to Collaborative Models Adding Collaborative Characteristics to Content-Based Models A single Unifying Recommendation Model

Recommender System Categories Content-based recommendations Collaborative recommendations Hybrid approaches User Similar items User1 User2 Similar taste items recommend Preferred Content-based &Collaborative Preferredrecommend User Preferred

Example for Content-based approach Considering a recommendation scenario Page 1: Department of Computer Science at Aalborg University. Page 2: Department of Health Science and Technology Search queue: Computer Science R1={Department, Computer Science, Aalborg University,} R2={Department", "Health Science", "Technology, } Use TF-IDF (term frequency inverse document frequency) Result: R1

Example for Content-based approach Department of Computer Science at Aalborg University Department of Health Science and Technology R1={Department, Computer Science, Aalborg University,} R1R2 R2={Department", "Health Science", "Technology, } Query Computer Science TF-IDF (term frequency inverse document frequency) Result: R1

Principle of Collaborative filtering Two kinds of approaches: User-based: select the K similar users (KNN) or called memory-based Item-based: select the closest item set or called model-based

Example for Collaborative Filtering Example 2 in Amazon.com: The algorithm generates recommendations based on customers who bought this book also brought other book (similar preferences to the user).

Recommendations

Similarities and Differences Content-based recommendations Collaborative Filtering recommendations Similarity Vectors of TF-IDF weights Vectors of the actual user-specified ratings

Limitations CategoriesLimitations Content-based (By keywords) Limited Content Analysis Over Specialization New User Problem Collaborative recommendation New User Problem - cold start New Item Problem cold start Sparsity Problem Hybrid N/A

Some Extending Capabilities (1/2) Comprehensive Understanding of Users and Items Extensions for Model-Based Recommendation Techniques Multidimensionality of Recommendations Extend 2-Dimensional to Multi-dimensional User Item User Item other1 other2 User, Movie, Time, Place

Some Extending Capabilities (2/2) Multi-criteria Ratings Restaurant (food, dcor, service, price) Non-intrusiveness Flexibility Users flexibility Effectiveness of Recommendations Metrics related

Insights of recommender systems Closely look at recommender systems from different perspectievs

What do with data - implementation Two kinds of problem with data: Information retrieval (IR): static content, dynamic query -> modeling content (organized with index) Information Filtering (IF): dynamic content, static query -> modeling query (organized as filters) Recommendation is between IR and IF since the content varies slowly and the queries depend of few parameters. Methods of both IR and IF are then used to reduce computation at query time.

General purpose Top-k filtering: list of "best" items (main usage) or anti-spam Items correlation: find similar items Prediction of rating: predict any pair between any pair of an user and an item (more general)

Degree of personalization Generic: everyone receives same recommendations Demographic: everyone in the same category receives same recommendations Contextual: recommendation depends only on current activity Persistent: recommendation depends on long-term interests

What the Data be Context of the current page (current request, item currently explored and structured content about this context) History of the current user on the system (explicit or implicit ratings) History of all users on the system History of the current user on multiple systems, the whole web or even on its computer History of all users on multiple systems, the whole web or even their computer

How to design Recommender System Explicit Data Rating data (Rate a film in Netflix, Like or Dislike in Youtube) Implicit Data Log (users activities-the implicit feedback) Recommender System based on users data

Emerging of New Recommendation Approaches Collaborative Filtering (Social Recommender) Compare with traditional content based approach Recommendation from friends Daily recommendation from friends News feeds, FaceBook, Re-tweet Recommendation over social media (blog, YouTube) Recommendation by using social data Social network Social tagging

Multi-Relational Social Data http://www.dasfa.net/wiki/index.php?title=Image:Metafac.png Node: facet Hyper-edge: relationship We are in a big social network.

Recommendation from friends- Facebook

Social recommendation by social media

Social relationship is powerful G.Groh et al. Recommendations in taste related domains, GROUP07, November 47, 2007, Sanibel Island, Florida, USA Social Filtering approach outperforms CF approach in the experiments SF vs. CF

Output Input Algorithms Recommender System Overview User-Item KNN; Clustering-based; Graph-based; Matrix Factorization; Information Diffusion; Probabilistic model; User item rating Social relations Social tagging Query Time Location Information item Tags Merchandise/Ads Persons Community

Tags is personal annotation User1 User2 User3 User4 User5 Resources Tag >Metadata >Index >A users personal opinion expression >Implicit rating or voting on the tagged information resources or items. Tag

Tagging Types Self-tagging Users can only tag their own contributions Permission-based Users decide who can tag their resources Free-for-all Any user can tag any resource

Tagging support Blind tagging User cannot see the other tags assigned to the resource theyre tagging Viewable tagging Users can see the other tags assigned to the resource theyre tagging Suggestive tagging User sees suggested tags for the resource theyre tagging

Aggregation of Tag Bag-model Same tag can be assigned to a resource multiple times. (Delicious) Set-model A tag can be applied only once to a resource. (Flickr)

Tag Temporal Behavior over time Tags convergence The tags assigned to a certain Web resource tend to stabilize and to become the majority. Tags divergence Tag-sets dont converge to a smaller group of more stable tags and where the tag distribution continually changes. Tags periodicity Tags evolve and decay with time.

Tag based RS Tag based Recommender System Users Resources Tags t1,t2,t3 t7,t2,t5 t1,t2,t3 t1,t8,t7 t1,t8,t9 t1,t8,t7

Extension of User-Item Tso-Sutter et al. 2008 User tags as items, Item tags as users,, reduce

Folksonomy model Definition :A folksonomy is a quadruple F := (U; T; R; Y), where U, T, R are finite sets of instances of users, tags, and resources and Y defines a relation, the tag assignment, between these sets, that is, Y U T R. Converting the Folksonomy into an Undirected Graph. First we convert the folksonomy F = (U, T,R, Y ) into an undirected tripartite graph G F = (V,E) as follows. V = U T R E = {{u, t}, {t, r}, {u, r} | (u, t, r) Y }, with each edge {u, t} being weighted with |{r R : (u, t, r) Y }|, each edge {t, r} with |{u U : (u, t, r) Y }|, and each edge {u, r} with |{t T : (u, t, r) Y }| Employ: Adapted PageRank Algorithm

FolkRank Hotho et al ECSW2006 PageRank A page is important if there many pages linking to it, and if those pages are important themselves A resource which is tagged with important tags by important users becomes important itself. (The same holds, symmetrically, for tags and users.) FolkRank graph of tags has no direction Directed graphs. Recommend a set of related users and resources for a given tag.

Difference highlights Documents that are of potential interest to a user can be suggested to him. When using a certain tag, other related tags can be suggested. Folk-Rank additionally considers the tagging behavior of other users. Other users that work on related topics can be made explicit, improving thus the knowledge transfer within organizations and fostering the formation of communities.

Tensor Factorization Symeonidis et al.2008Rendle et al.2009 Tensor Factorization

Tensor factorization HOSVD (Symeonidis et al TKDE 2010) Basic idea: by optimizing the square loss: Other optimization measure, e.g., AUC (Area Under Curve) Rendle et al SIGKDD 2009

Others The GroupMe! System (Abel et al. 2007). PLSA (Probabilistic Latent Semantic Analysis) (Wetzker et al. 2009). Tag-based profile construction Nave (Szomszor et al. 2007), co-occurrence (Michlmayr and Cayzer 2007) and adaptation approach (Dorigo and Caro 1999). WebDCC (Web Document Conceptual Clustering) (Godoy and Amandi 2006) Music recommendation system (Uitdenbogerd and van Schnydel 2002)

The limitations Tags have little semantics and many variations The correlation between sets of tags Uncontrolled vocabulary- users behavior in their ways Redundancy and ambiguity in the tag database Tags do not describe the document, but a judgment. Non-English-speaking language tags, e.g. Vienna, Wien.

Data Quality How to manage the cold start problem (new user, new item) or more generally data sparsity? The system must have a special behavior for user with few ratings (eg. not personalized recommendation) The system may use bot-users to rate new items according to the content

Confidence and display How to improve the confidence in the recommender system? By providing good recommendations! By providing information about each recommendation (eg. Ratings, explanation) How to display recommendations? The item recommended must be easy to identify and evaluate by the user Ratings must be easy to understand and meaningful Explanations must provide a quick way for the user to evaluate the recommendation

Interaction (1/2) How to interact with the user? You may ask the user to correct a prediction You must update your rating matrix with this prediction and update your recommendation accordingly You may want to learn the key parameters of your algorithm using the feedback You may ask the user to provide feedback on the explanation You may ask the user to provide more context for the current task (eg. by using categories)

Interaction (2/2) How to manage scalability Applications usually need real-time prediction computation The computation time has to scale with number of users and items How to manage temporal changes? You can not run your algorithms each time a modification occurs The off-line computation must be robust to small modification and scheduled accordingly The on-line computation must benefit from modifications The computation must be done incrementally when possible The system may "forget" older information

Data and security (1/2) How to insure privacy? If the profile is public, there is no privacy issues. If the profile is private, the system should avoid to give too much information using anonymity techniques. This problem is even worse in cross-systems

Data and security (2/2) How to design algorithms that are robust against manipulation? Attacks are characterized by a number of false users and knowledge on the system. The attacker want to modify the distribution of the ratings without being easy to detect. There is a lot of known attacks such as sampling attack, random attack, average attack, bandwagon attack... Lot of techniques to detect attack : find profiles which are unlikely according to the global distribution of profiles, find profiles updates which are unlikely according to the global distribution of updates...

Improvement (1/4) How to manage diversity? Recommending very close items could be counter-productive (since they may be substitute) ->Systems can use correlation between items (eg. base on content) to filter items Recommending what everybody like and what the user already know is not really interesting ) Systems can try more risky prediction (eg. high score with low confidence)

Improvement (2/4) How to use social networks to improve recommendations? Users are likely to like what their friends like. Exploring the social graph is a direct way to do recommendation Correlation between user could be biased by the social graph Potential friends could be suggested using recommendation techniques.

Improvement (3/4) How to recommend for a group? The recommendation for the group could be an aggregation of the recommendation for the members. The group could be seen as a user (with aggregation functions to reconciliation ratings)

Improvement (4/4) How to evaluate recommendation? There is a lot of noise on the data, which could be the main source of errors It is more difficult to evaluate when there is no rating. It is even more difficult if you want to improve recommendation by adding constraints like diversity

Possible further research Incorporate relevance feedback into recommendation Using tag clouds to improve user experience Examining quantitative aspect of folksonomy and the use of tags Examining user behavior based on implicit feedback Tag ambiguity and sparsity Tag uniformity

Conclusion The rationale of recommender systems The state-of-the-art progress of recommender systems Social tagging systems demands new recommender systems Techniques in tag-based recommender systems Some open research questions & possible further directions

Dr. Guandong Xu Intelligent Web and Information system Department of Computer Science Aalborg University The Research Progress of Recommender Systems in.

Documents

r1 slide

contentbased models

contentbased characteristics

computer science r1

recommender system grouplens

preferred content

collaborative models

collaborative characteristics