Top Banner
Mathematical Methods of Tensor Factorization Applied to Recommender Systems Dott. Giuseppe Ricci Scuola di Dottorato in Informatica XXVI Ciclo PhD Defense – 26 May 2014 Semantic Web Access and Personalization research group http:// www.di.uniba.it/~swap Dipartimento di Informatica 1
96
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: PhD defense

1

Mathematical Methods of Tensor Factorization

Applied to Recommender Systems

Dott. Giuseppe RicciScuola di Dottorato in Informatica

XXVI Ciclo

PhD Defense – 26 May 2014

SemanticWeb Access and Personalization research grouphttp://www.di.uniba.it/~swap

Dipartimento di Informatica

Page 2: PhD defense

2

Outline

Motivations and Contributions

Information Overload & Recommender Systems

Matrix and Tensor Factorization in RS literature

Proposed solutions

Experimental Evaluation

Summary and Future Work

Page 3: PhD defense

3Motivations and Contributions

1/2Matrix Factorization (MF) techniques have proved

to be a quite promising solution to the problem of designing efficient filtering algorithms in the Big Data Era.

Several challenges in Recommender Systems (RS) research area: missing values: data sparsity incorporating contextual information: CARScontext relevance (weighting) in CARS.

This work focuses on CARSObjective: to propose new methods to understand which contextual information is relevant, and use this information to improve the quality of the recommendations.

Page 4: PhD defense

4

Matrix and Tensor Factorization literature review.

CP-WOPT algorithm solution for sparsity of RS data.

CARS and context-weighting:2 proposed solutions to introduce only

relevant contextual information in recommendation process

empirical evaluation of the 2 solutions.

Motivations and Contributions 2/2

Page 5: PhD defense

5

Information Overload&

Recommender Systems

Page 6: PhD defense

6Information Overload

Source: www.go-globe.com

Surplus of content compared to user’s ability to find relevant information result is either you are late in making decisions, or you make the wrong decisions.“Information Overload” was used by the futurologist Alvin Toffler in 1970, when he predicted that the rapidly increasing amounts of information being produced would eventually cause people problems.

Page 7: PhD defense

7

Recommender Systems 1/2Recommender Systems (RS) represent a

response to the problem of Information Overload and are now a widely recognized field of research [Ricci].

RS fall in the area of information filtering. With the growing amount of information available on the web, a very sensitive issue is to develop methods that can effectively and efficiently handle large amounts of data.

Mathematical methods have been proved useful in dealing with this problem recently in the context of the RS.

The search for more effective and efficient methods than those known in literature also guided by the interest in industrial research in this field, as evidenced by the NetFlixPrize competition.

[Ricci] Francesco Ricci, Lior Rokach, Bracha Shapira, and Paul B. Kantor, editors. Recommender Systems Handbook. Springer, 2011.

Page 8: PhD defense

8Recommender Systems 2/2Usually rating is stored in a matrix called user-item

matrix or rating matrix.

RS calculate a rating estimate for item/product not purchased/tried suggestion list with the highest rating estimation.

2

5

Page 9: PhD defense

9Examples of RS

Applications:• e-commerce• advertising• e-mail filtering• social network……

Page 10: PhD defense

10

Basics of Recommender Systems

Page 11: PhD defense

11

Recommender Systems: definitionsThe area of RSs is relatively new mid-1990s.

Concept: tools and techniques able to provide personalized information access to large collections of structured and unstructured data and to provide users with advices about items they might be interested in.

Some definitions:

[Olsson]: “RS is a system that helps a user to select a suitable item among a set of selectable items using a knowledge-base that can be hand-coded by experts or learned from recommendations generated by the users”.

[Burke]: “RS have the effect of guiding the user in a personalized way to interesting or useful objects in a large space of possible options”.

[Olsson] Tomas Olsson. Bootstrapping and Decentralizing Recommender Systems . PhD thesis, Department of Information Technology, Uppsala University and SICS, 2003.[Burke] R. Burke. Hybrid Recommender Systems: Survey and Experiments. UserModeling and User-Adapted Interaction, 12(4):331–370, 2002.

Page 12: PhD defense

12

RS Classification [Burke]

[Burke] Robin Burke. Hybrid recommender systems: Survey and experiments. User Modeling and User-Adapted Interaction , 12(4):331–370, 2002.

Context Aware Recommender Systems (CARS)

Page 13: PhD defense

13

Content-Based RS (CBRS)Assumption: user preferences remain stable over time.

They suggest items similar to those previously labeled as relevant by the target user.

Based on the analysis and exploitation of textual contents since each item to be recommended has to be described by means of textual features.

Needs 2 pieces of information: a textual description of the item and a user profile describing user interests in terms of textual features.

Page 14: PhD defense

14

Collaborative Filtering RSAssumption: users that in the past shared

similar tastes will have similar tastes in the future as well nearest neighbors.

Rely with a matrix where each user is mapped on a row and each item is represented by a column user/item or rating matrix.

A recent trend is to exploit matrix factorization methods A common technique applied in CFRS is Singular Value Decomposition (SVD).

Page 15: PhD defense

15

Hybrid Recommender Systems

Combining 2 or more classes of algorithms in order to emphasize their strengths and to level out their corresponding weaknesses.

For example, a collaborative system and a content-based system might be combined to compensate the new user problem, providing recommendations to users whose profiles are too poor to trigger the collaborative recommendation process.

Burke proposed an analytical classification of hybrid systems, listing a number of hybridization methods to combine pairs of recommender algorithms. In [Burke] 7 different hybridization techniques are introduced.

[Burke] Robin Burke. The adaptive web. chapter HybridWeb Recommender Systems, pages 377–408. Springer-Verlag, Berlin, Heidelberg, 2007.

Page 16: PhD defense

16ContextWhat is the context?

One of the most cited definition of context is that of Dey [Dey] et al. that defines context as:

”Any information that can be used to characterize the situation of an entity. An entity is a person, place, or object that is considered relevant to the interaction between a user and an application, including the user and the applications themselves”.

Bazire and Brezillon [Bazire] examined and compared some 150 different definitions of context from a number of different fields and concluded that the multifaceted nature of the concept makes it difficult in find a unifying definition.

Li [Li] et al. define 5 context dimensions: who (user), what (object), how (activities), where (location) and when (time).

[Dey] Anind K. Dey. Understanding and using context. Personal Ubiquitous Comput.,5(1):4–7, 2001.[Bazire] Mary Bazire and Patrick Brézillon. Understanding context before using it. In Proceedings of the 5th International Conference on Modeling and Using Context ,CONTEXT’05, pages 29–40, Berlin, Heidelberg, 2005. Springer-Verlag.[Li] Luyi Li, Yanlin Zheng, Hiroaki Ogata, and Yoneo Yano. A framework of ubiquitous learning environment. In CIT , pages 345–350. IEEE Computer Society, 2004.

Page 17: PhD defense

17

Context-Aware Recommender System (CARS) take account of contextual factors,such as available time, location, people nearby, etc., that identify the context where the product is tried.

We suppose these factors may have a structure:for example "location" may be defined in

terms of home, public place, theatre, cinema, etc.

Context Aware RS (CARS)

Page 18: PhD defense

18

Challenges of a CARS are: relevance of contextual factors: it is

important to decide which contextual variables are relevant in the recommendation process;

availability of contextual information: relevant contestual factors can be considered as a part of the data collection but such historical contextual information is often not available when designing the system;

extraction of contextual information from user’s activities: these data need to be recorded;

evaluation and lack of publicly available datasets.

Context Aware RS

Page 19: PhD defense

19

CARS incorporates users and items information as well as other types of data such as context, using these to infer unkonwn ratings:

f: Users x Items x Contexts Rating

CARS deals with a quadruple input: <user, item, context, rating> where the recommender records the preference of the user from the selected item according to the context information which tells you if the product is consumed by the user.

Context Aware RS

Page 20: PhD defense

20Paradigm to incoporate context

In a movie RS, if a user wants to see a film one day during the holidays, only the ratings assigned in holidays are used

Data are used in the estimation of theratings by a multidimensional function or by a heuristic calculations to incorporate contextual information in addition to the user and item data

Pre-filtering

Post-filtering ContextualModeling

Page 21: PhD defense

21

Context Weighting It is not always simple to provide what contextual

information is important for a specific scope.

Many parameters - in different manners. Not all acquired contextual information are important for the recommendation process: some contextual variables can introduce noise degrade the quality of suggestions.

For each user, what contextual information is helpful to give, for more precise and reliable recommendations.

PROBLEM: users may rate items in different contexts, but it is not guaranteed that we can find dense contextual ratings under the same context, i.e. there may be very few users who have rated the items in the same contexts.

Solutions: 2 branches: Context Selection (survey) and Context Relaxation (binary selection).

Page 22: PhD defense

22

Matrix Factorization in RS literature

Page 23: PhD defense

23

BackgroundWith the ever-increasing information available,the

challenge of implementing personalized filters has become the challenge of designing algorithms able to manage huge amounts of data for the elicitation of user needs and preferences.

Matrix Factorization techniques have proved to be a quite promising solution.

MF techniques fall into the class of CF methods, and, particularly, in the class of latent factor models similarity between users and items is induced by some factors hidden in the data.

We will focus our attention on Singular Value Decomposition (SVD).

Page 24: PhD defense

24

Basics of MFU: set of users

D: set of items

R: the matrix of ratings.

MF aims to factorize R into two matrices P and Q such that their product approximates R:

A factorization used in RS literature is Singular Value Decomposition (SVD) introduced by Simon Funk in the NetFlix Prize.SVD-objective: reducing the dimensionality, i.e. the rank, of the user-item matrix, in order to capture latent relationships between users and items.

Page 25: PhD defense

25SVD in RS Literature 1/2Sarwar:

SVD based algorithm Low-rank approximation: retaining only k << r singular values

(the biggest) by discarding other entries.

Koren: SVD based algorithm (Asymmetric-SVD, SVD++) Explicit and implicit feedback Baseline estimates.

Julià: Alternation Algorithm An alternative to SVD The aim is the same as the one of SVD Alternation makes it possible to deal with missing.

user-factors vector pu

item-factors vector qi

Page 26: PhD defense

26

Advantages: limited computational cost and good quality

recommendations (Sarwar)good algorithms and high accuracy (Koren)Alternation Algorithm deals with missing values and good

computational resources required (Julià).

Problems:technique not applicable on frequently updated database

(Sarwar)models are not justified by a formal model (previous ratings

are not explained) (Koren)r known values in each row/column (Julià).

[Sarwar] Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl Incremental Singular Value Decomposition Algorithms for Highly Scalable Recommender Systems, 5th International Conference on Computer and Information Technology (ICCIT), 2002[Koren] Yehuda Koren Factorization Meets the Neighborhood: a Multifaceted Collaborative Filtering Model, ACM Int. Conference on Knowledge Discovery and Data Mining (KDD'08), 2008[Julià] Carme Julià, Angel D Sappa, Felipe Lumbreras, Joan Serrat, Antonio López Predicting Missing Ratings in Recommender Systems: Adapted Factorization Approach, in International Journal of Electronic Commerce (2009)

SVD in RS Literature 2/2

Page 27: PhD defense

27

Summary

• We analyzed MF technique• We focused our attention on SVD techniques • The main limitations of MF techniques:

• they take into account only the standard profile of the

users • does not allow to integrate further information such

as the context.

Page 28: PhD defense

28

Matrix 2 TensorMatrix and MF can’t be used in a CARS based on a

contextual modeling paradigm: context information is used in the process of

recommendation and matrices are not adeguate for this scope.

We need to introduce tensors.

users

contexts

items

<user, item, context, rating>

Page 29: PhD defense

29

Tensor Factorization:HOSVD and PARAFAC

in RS literature

Page 30: PhD defense

30

TensorsTensors higher-dimensional arrays of numbers, might

be exploited in order to include additional contextual information in the recommendation process.

In standard multivariate data analysis, data are arranged in a 2D structure, but for a wide variety of domains, more appropriate structures are required for taking into account more dimensions:

xijk i=1,..,I j=1,..,J k=1,..,K.

2 particular TF can be considered to be higher-order extensions of matrix Singular Value Decomposition:

1. High Order Singular Value Decomposition (HOSVD) which is a generalization of SVD for matrices;

2. PARallel FACtor analysis or CANonical DECOMPosition (PARAFAC/CANDECOMP) higher-order form of Principal Component Analysis.

Page 31: PhD defense

31

HOSVD decomposes the initial tensor in N matrices (where N is the size of the tensor) and a tensor whose size is smaller than the original one (core tensor).

Tensor Factorization

In RS literature, the most frequently used technique for tensor factorization is HOSVD.

Page 32: PhD defense

32

HOSVD in RS Literature 1/2Baltrunas:

Multiverse Recommendations algorithmHOSVD TF based algorithm data: users, movies, contextual information and

user ratings 3-order tensor.

Rendle: RTF algorithm social tagging systemReconstructed tensor: measure the strength of

association between users, items and tags.

Chen: CubeSVD Personalized web search Hidden relationships <user, query, web pages> Output: < u, q, p, w>: w measures the popularity of

page p as a result of query q made by the user u.

Page 33: PhD defense

33

HOSVD in RS Literature 2/2Advantages: good algorithm with improvement of results (Baltrunas)good algorithm with improvement of results (Rendle)CubeSVD tested on MSN clickthrough gives good results

(Chen).

Problems:high computational cost (all) time consuming algorithm (Chen).

[Baltrunas] Alexandros Karatzoglou, Xavier Amatriain, Linas Baltrunas, and Nuria Oliver. Multiverse recommendation: n-dimensional tensor factorization for context-aware collaborative filtering. In Proceedings of the fourth ACM conference on Recommender systems , RecSys ’10, pages 79–86, New York, NY, USA, 2010. ACM.[Rendle] Steffen Rendle, Leandro Balby Marinho, Alexandros Nanopoulos, and Lars Schmidt-Thieme. Learning optimal ranking with tensor factorization for tag recommendation. In KDD , pages 727–736, 2009.[Chen] Jian-Tao Sun, Hua-Jun Zeng, Huan Liu, Yuchang Lu, and Zheng Chen. Cubesvd: a novel approach to personalized web search. In Proceedings of the 14th international conference on World Wide Web , WWW’05, pages 382–390, New York, NY, USA, 2005. ACM.

Page 34: PhD defense

34

PARAFAC (PARallel FACtor analysis)

PARAFAC (PARallel FACtor analysis) is a decomposition method. The PARAFAC model was independently proposed by Harshman and by Carroll and Chang.

A PARAFAC model of a 3D array is given by 3 loading matrices A, B, and C with typical elements aif, bjf, and ckf.

Page 35: PhD defense

35

HOSVD Vs PARAFACHOSVD:• HOSVD is an extension of the SVD to higher order

dimensions;• is the ability of simultaneously taking into account more

dimensions;• better data modeling than standard SVD; • dimension reduction can be performed not only in one

dimension but also separately for each dimension.

HOSVD:• it is not an optimal tensor decomposition: HOSVD does not

require an iterative algorithms, but needs standard SVD computation only;

• it has not the truncation property of the SVD, where truncating the first n singular values allows to find the best n-rank approximation of a given matrix;

• HOSVD cannot deal with missing values, they are treated as 0;

• to prevent overfitting, HOSVD should use regularization.

Page 36: PhD defense

36

PARAFAC: • is faster than HOSVD: linear computation time in

comparison to HOSVD;• does not collapse data, but retains its natural

three-dimensional structure;• despite PARAFAC mode’s lack of ortogonalithy,

Kruskal showed that components are unique, up to permutation and scaling, under mild conditions.

PARAFAC Vs HOSVD

Page 37: PhD defense

37

PARAFAC in [Baltrunas12]TFMAP PARAFAC top-N context-aware recommendations of mobile applications. A tensor of 3 dimensions is factorized:• users• items • context types.

Dimensions 3 factor matrices calculate user m’s preference to item i under context type k:

The authors introduced an optimization process using a gradient ascendent to avoid overfitting.

[Baltrunas12] Yue Shi, Alexandros Karatzoglou, Linas Baltrunas, Martha Larson, Alan Hanjalic, and Nuria Oliver. Tfmap: optimizing map for top-n context-aware recommendation. In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval , SIGIR ’12, pages 155–164, New York, NY, USA, 2012. ACM

Page 38: PhD defense

38

Advantages:

TFMAP tested on Appazar projecet dataset increase MAE and Precision compared to other algorithms

good scalability: the training time of TFMAP increases almost linearly.

Problems:

TFMAP is tested only on 1 dataset

Significance of results ??

PARAFAC in [Baltrunas12]

Page 39: PhD defense

39

PARAFAC in [Acar]

[Acar] Evrim Acar, Daniel M. Dunlavy, Tamara G. Kolda, and Morten Mørup. Scalable tensor factorizations with missing data. In SDM10: Proceedings of the 2010 SIAM International Conference on Data Mining , pages 701–712, Philadelphia, April 2010. SIAM.

PARAFAC goal: to capture the latent structure of the data via a higher-order factorization, even in the presence of missing data. The authors develop a scalable algorithm called CP-WOPT (CP Weighted OPTimization).

Numerical experiments on simulated data sets CP-WOPT can successfully factor tensors with noise and up to 70% missing data.

Page 40: PhD defense

40

CP-WOPT is tested on EEG dataset: • it is not uncommon in EEG analysis that the signals from

some channels are ignored due to the malfunctioning of the electrodes

• the factors extracted by the CP-WOPT algorithm can capture brain dynamics in EEG analysis even if signals are missing from some channels.

PARAFAC in [Acar]

Page 41: PhD defense

41

Advantages:

CP-WOPT deal with missing values

CP-WOPT uses a weighted factorization based on PARAFAC

good results on tested dataset.

IDEA: CP-WOPT RS

Problems

Computational cost ??

PARAFAC in [Acar]

Page 42: PhD defense

42

Proposed solutions for missingvalues and context weighting

Page 43: PhD defense

43

ScenarioCARS represent an evolution of the traditional CF

paradigm.

State-of-the-art is based on TF as a generalization of the classical user-item MF that accomodates for the contextual information.

We are interested in the PARAFAC technique for its ability to deal with missing values.

We will propose the use of the algorithm CP-WOPT: our target is to identify the most promising method of factorization (PARAFAC) and the best algorithm implementing this factorization.

We propose 2 solutions to the problem of context weighting.

Page 44: PhD defense

44

CP-WOPT Algorithm

W tensor

Rank of the tensor X

Gradient Matrices

Page 45: PhD defense

45

Implementation DetailsCP-WOPT algorithm is implemented in Java.

Input tensor is given from a CSV file.

Values range from 1 to 5.

Missing values are conventionally represented 0.

The output returned approximation of the input tensor with the reconstructed missing data is stored into a CSV file.

Values less than 0 are normalized to 0.

Page 46: PhD defense

46CWBPA (Context Weighting with Bayesian Probabilistic Approach)

1/4Idea: Conditional Probability + Bayes’ Theorem.

1) Conditional Probability for each user and each context.

2) Compare this distribution with an equiprobable distribution divergence measure.

• If the 2 distributions are similar context does notinfluence the user’s rating;

• If they are very different rating is influenced by the context where the divergence measure is the highest.

Page 47: PhD defense

47

CWBPA 2/4

cij="clearly", "sunny", "cloudy", "rainy”

Assumption: liking = rating is influenced from context

Contingency table for the context ci

L: Liking variable

E. G.: ci=“weather”

n tables (contexts’ nr) x 1 user

Page 48: PhD defense

48

CWBPA 3/4

P(ci=cij|L = 1); i = 1,..,mi ? Bayes’ Theorem

Page 49: PhD defense

49

• Comparing 2 distributions divergent?• Degree of divergence: divergence index.

DEF.: given 2 distributions A and B, which both refer to the same quality character X, calling fA

k and fB

k the relative frequencies related to the k, k = 1,..,K modality of the A and B distributions, a possibile family of divergence index is:

CWBPA 4/4

Page 50: PhD defense

50

CWAIC (Context Weighting Association Index Calculation) 1/2

• Idea: for each user and each context we want to calculate the Association Index of Cramér between liking and context.

• Objective: to determine if context influences the

rating. • We establish a threshold under which there is not a

dependency rating-context, but over which there is influence or dependency.

• Association measures are based on the value of X2, obtained from a r x c contingency table.

• X2 test is helpful to verify independence hypotheses (corresponding to a zero association) between:• the modalities of the row variable• the modalities of the coloumn variable.

Page 51: PhD defense

51

CWAIC 2/2Cramér’s Index Φc The Cramér’s Index contingency table of dimensions rxc. Based on X2 which is the most applied index for associations measures. It is calculated as:

Φc=>0 not associationΦc=1 perfectly correlation but only if the table is square

Total observation number

k=min(r, c)

Page 52: PhD defense

52

Using CWBPA and CWAIC

Tensor – all context

CWBPA

CWAIC

Influential Variable

NOT Influential Variable

Output

REDUCED TENSOR

Factorization with CP-WOPT

Page 53: PhD defense

53

Experimental Evaluation

Page 54: PhD defense

54Evaluation of RS 1/3Standard metrics have been defined by judging

how much the prediction deviate from the actual rating.

Predictive accuracy metrics:Mean Absolute Error (MAE): this metric measures the

deviation between prediction and actual rating provided by the user:

Root Mean Squared Error (RMSE): follows the same principle of MAE but it squares the error before summing. Consequently, it penalizes large errors since they become much more pronounced than small ones.

Page 55: PhD defense

55

Classification metrics: these metrics evaluate how well a RS can split the item space into relevant and non-relevant items.

Precision: this metric counts how many items among the recommended ones are actually relevant for the target user.

Recall: this metric counts how many items among those that are relevant for the target user are actually recommended.

Evaluation of RS 2/3

Recommended Content NOT Recommended Content

Relevant Content

True Positive (TR) False Negative (FN)

Irrelevant Content

False Positive (FR) True Negative (TR)

Page 56: PhD defense

56

F-Measure: a metric defined as the harmonic mean of precision and recall metrics. Let β be a parameter that determines the relative influence of both precision and recall, the F-Measure is calculated as follows:

β=1

Evaluation of RS 3/3

Page 57: PhD defense

57

• 3 preliminary tests of the CP-WOPT verify the effectiveness of this algorithm and to evaluate standard metrics;

• 1 evaluation without context;

• 2 evaluations to test our solutions CWBPA and CWAIC for context weighting.

Introduction 1/2

Page 58: PhD defense

58

Introduction 2/2

Why 2 Baselines?

• 1 without contextual information on 1 dataset• 1 with all contextual information available on 1

dataset.

Does the proposed solutions work as a “filter” for contextual information?

Page 59: PhD defense

59CP-WOPT: preliminary evaluations 1/5Preliminary user study:• 7 real users• rated a fixed number of movies (11) • 3 contextual factors.

3 contextual factors:i) if they like to watch the movie at home or at the cinema;ii) with friends or with a partner;iii) with or without family.

Ratings range: 1-5 with “encoding” of context into rating:• rating 1 and 2 express a strong and a modest

preference, respectively, for the first context term;• rating 3 expresses neutrality;• rating 4 and 5 express a modest and a strong

preference, respectively, for the second context term.

Page 60: PhD defense

60

CP-WOPT: preliminary evaluations 2/5Metrics used: accuracy – coverage.

Accuracy: the percentage of known values correctly reconstructed:

Coverage: the percentage of non-zero values returned:

Page 61: PhD defense

61

The experiment shows that it is possible to express, through the n -dimensional factorization, not only recommendations to the single user, but also more general considerations such as the mode of using an item, i.e. its trend of use.

CP-WOPT: preliminary evaluations 3/5

Page 62: PhD defense

62

CP-WOPT: preliminary evaluations 4/5

• Dataset used: subset of Movielens 100K• Input: tensor of dimensions 100 users x 150

movies x 21 occupations.• Contextual information: occupation (only

available information in the dataset as contextual information)

• Results:• acc = 92,09% • cov = 99,96%• MAE = 0,60 • RMSE = 0,93.

Acceptable accuracyCoverage is very good

Page 63: PhD defense

63

CP-WOPT: preliminary evaluations 5/5

Baseline: MyMediaLite* RS• UserItem-Baseline: CF algorithm• SVDPlusPlus: MF algorithm based on Singular

Value Decomposition

* http://www.mymedialite.net

Page 64: PhD defense

64

Evaluation of an explicit context datasetDataset: LDOS-CoMoDa**

LDOS-CoMoDa contains: • ratings for the movies • the 12 pieces of contextual information describing the situation in which the movies were watched.

Properties: • ratings and the contextual information are explicitly acquired from the users immediately after they consumedthe item;• the ratings and the contextual information are from real user-item Interaction; • users are able to rate the same item more than once if they consumed the item multiple times.

** www.ldos.si/comoda.html

Page 65: PhD defense

65

LDOS-CoMoDa dataset has been in development since 15 September 2010. It contains 3 main groups of information: general user information: provided by the user upon

registering in the system user’s age, sex, country and city;

item metadata: inserted into the dataset for each movie rated by at least one user director’s name and surname, country, language, year;

contextual information.

LDOS-CoMoDa

Page 66: PhD defense

66

We experimented CP-WOPT on LDOS-CoMoDa dataset with ALL CONTEXT selected (19 contextual features).

Accuracy Metrics

We use 70% of ratings, by replacing the 30% of known rating with zero values. The 30% of values is randomly choosen.

Evaluation on explicit context dataset 1/2

Page 67: PhD defense

67

CAMF (CAMF_C) DCW 1.017 SpliingApproaches (UI Splitting)

CP-WOPT 0

0.2

0.4

0.6

0.8

1

1.2 RMSE

Evaluation of explicit context dataset 2/2

Page 68: PhD defense

68Baseline without contextThis experiment aims at creating a baseline to compare

our standard recommendation algorithms which do not exploit contextual information, so we want to use a 2D recommender.

For this purpose we run Mahout Algorithms on LDOS-CoMoDa dataset.

The Mahout recommender requires an input file or data. We will use a CSV file where user’s ratings assigned under some contextual situations are stored.

We neglect contextual information.

We remove the ratings given on the same item under different contexts case.

We consider the first rating in temporal order ignoring the others.

We will rearrange the data as triplet: <id user, id item, rating>.

Page 69: PhD defense

69

Mahout algorithms comparedSome standard collaborative filtering algorithms are

compared: Singular Valued Decomposition Different algorithms based on several user similarity

measures (Spearman Correlation, Pearson Correlation, Euclidean Distance, Tanimoto Coefficient)

Algorithms based on item similarity (Log Likelihood, Euclidean Distance, Pearson Correlation)

Slope One Recommender.

For user similarity we use 10 neighborhoods to calculate the similarity between users.

We use 60% of the data as training set and 40% as test set.

Page 70: PhD defense

70Experimental Evaluation 1/6

SVD Pearson User Simi-

larity

Euclidean User Simi-

larity

Tanimoto User Simi-

larity

Spearman User Simi-

larity

Euclidian Item Simi-

larity

Pearson Item Simi-

larity

Tanimoto Item Simi-

larity

LogLikeli-hood Item Similarity

SlopeOne0.00

0.20

0.40

0.60

0.80

1.00

1.20

1.40

1.60

1.80

MAE

RMSE

Page 71: PhD defense

71

Experimental Evaluation 2/6

SVD Pearson User Similarity

Euclidean User Similarity

Tanimoto User Similarity

Spearman User Similarity

Euclidian Item Similarity

Pearson Item Similarity

Tanimoto Item Similarity

LogLikelihood Item Similarity

SlopeOne0.00

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.10

P@5

R@5

F-score @5

Page 72: PhD defense

72

Experimental Evaluation 3/6

SVD Pearson User Similarity

Euclidean User Similar-

ity

Tanimoto User Similar-

ity

Spearman User Similar-

ity

Euclidian Item Similar-

ity

Pearson Item Similarity

Tanimoto Item Similar-

ity

LogLikelihood Item Similar-

ity

SlopeOne0.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14

P@10 R@10

F-score @10

Page 73: PhD defense

73

Experimental Evaluation 4/6

0.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14

0.16

0.18

0.20

P@20

R@20

F-score @20

Page 74: PhD defense

74

Experimental Evaluation 5/6

SVD Pearson User Similarity

Euclidean User Similar-

ity

Tanimoto User Similar-

ity

Spearman User Similar-

ity

Euclidian Item Similar-

ity

Pearson Item Similar-

ity

Tanimoto Item Similar-

ity

LogLikeli-hood Item Similarity

SlopeOne0.00

0.05

0.10

0.15

0.20

0.25

P@50

R@50

F-score @50

Page 75: PhD defense

75

In general the low values are due to the fact that the methodology used for evaluating the ranked item lists includes unrated items in the test set.

These items are tagged as not-relevant, therefore leading to likely underestimated performance, compared to a situation where all ratings are available.

This is not a problem in our evaluation, since the goal is just to compare algorithms, and performance is equally understimated for all of them.

Spearman User Similarity algorithm, which gave the lowest error, and Euclidean User Similarity algorithms, which gave the best accurancy, as baseline.

Experimental Evaluation 6/6

Page 76: PhD defense

76

LDOS-CoMoDa dataset: d = 19 contextual featuresUser’s ratings with context information are stored in a CSV file.

We use 70% of ratings, by replacing the 30% of known rating with zero values. The 30% of values is randomly choosen.

CW Evaluation: Preliminary Phase

CW Proposed Solutions

Reduced Tensor

Page 77: PhD defense

77

CWBPA Evaluation 1/2This experiment is performed to test the 2 proposed solutions CWBPA and CWAIC for context weighting. We apply the 2 methods on LDOS-CoMoDa dataset for evaluating standard metrics MAE, RMSE, accuracy, coverage, P and R.

Contingency table L=1

We compare the probability distribution obtained from the previous calculations with the probability distribution 1/K, K = number of context variables.Divergence measure:

Page 78: PhD defense

78

CWBPA Evaluation 2/2

Page 79: PhD defense

79

Contingency table L=1for each context and each user.

For each table wecalculate the X2 coefficient and the

Cramér’s indexThreshold.

CWAIC Evaluation

Page 80: PhD defense

80

CWBPA Vs CWAIC

7 runs of the 2 algorithms: 4 for CWBPA3 for CWAICwe select the most significant contextual configurations.

Page 81: PhD defense

81CWBPA Vs CWAIC 1/2

Spearman User Similarity

Euclidean User Similarity

CWAIC CWBPA CP-WOPT0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9 MAE

RMSE

Page 82: PhD defense

82CWBPA Vs CWAIC 2/2

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

P R

Page 83: PhD defense

83

CWBPA Vs CWAIC – All users 1/2

Spearman User Similarity

Euclidean User Similarity

CWAIC CWBPA CP-WOPT0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9 MAERMSE

Page 84: PhD defense

84

CWBPA Vs CWAIC – All users 2/2

Spearman User Similarity

Euclidean User Similarity

CWAIC CWBPA CP-WOPT0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

P R

Page 85: PhD defense

85Result Analysis 1/2

• Evaluated CP-WOPT algorithm as possibile solution to the missing values:• with a small dataset • on a Movilens 100K subset we had good results with

a low error and good coverage value CP-WOPT is able to reconstruct the tensor leaving only few values as missing data;

• On Movielens: results reached are in line with those know in literature;

• CP-WOPT on LDOS-CoMoDa dataset is better than other state-of-art recommendation algorithms;

• Neglecting the contextual information by using a regular 2D RS, CF algorithms Spearman User Similarity and Euclidean User Similarity provided better performance.

Page 86: PhD defense

86

• CWBPA and CWAIC give different responses to the problem of context weighting;

• CWBPA and CWAIC are evaluated on LDOS-CoMoDa dataset, showing their effectiveness;

• Using only some contextual variables lead to give more precise recommendations;

• CWAIC has better performance than CWBPA.

Result Analysis 2/2

Page 87: PhD defense

87

Summary andFuture Work

Page 88: PhD defense

88

Recap

Information Overload

Page 89: PhD defense

89Recap

Recommender Systems

Page 90: PhD defense

90

Recap

CF MF Tensors

TF - ContextProposals:CP-WOPTCWBPACWAIC

Page 91: PhD defense

91

Recap – Experimental Evaluation

5 Evaluations to test:

• Effectiveness of CP-WOPT into RS;• 2 proposed solutions for context weighting:• both approaches seem effective;• using only relevant contexts leads better

recommendations compared to a traditional 2D RS or using all contextual information available.

Page 92: PhD defense

92

Future Work 1/3

LDOS-CoMoDa dataset experiment on all context available.

• 12 contextual variables in the LDOS-CoMoDa dataset;

• We used only 5 of them to reduce the computational effort;

• New extended evaluation of the Bayesian Probabilistic Approach and of the Association Index to minimize the dimensions of the tensor.

Page 93: PhD defense

93

Future Work 2/3

Test on another contextual dataset.

We want to test CP-WOPT, CWBPA and CWAIC on other datasets having explicit contextual information such as:

• AIST Food dataset• TripAdvisor dataset

to improve the significance of the results.

Page 94: PhD defense

94

Future Work 3/3A Real Application.

We want to implement a web-based system to acquiredata and test our proposed solutions in a concrete scenario, such as:

Personalized Context-Aware Electronic Program Guides.

Page 95: PhD defense

95PubblicationsMost of the work presented is collected in the publications:

Giuseppe Ricci, Marco de Gemmis, Giovanni SemeraroMatrix and Tensor Factorization Techniques applied to RecommenderSystems: a Survey.International Journal of Computer and Information Technology(2277 – 0764) Volume 01– Issue 01, September 2012.

Giuseppe Ricci, Marco de Gemmis, Giovanni SemeraroMathematical Methods of Tensor Factorization Applied to Recommender SystemsNew Trends in Databases and Information Systems17th East European Conference on Advances in Databases and Information Systems Volume 241, ISBN 978-3-319-01862-1, 2013, pp 383-388. Results of Experimental Evaluation are in phase of submission.

Page 96: PhD defense

96Questions?

“In things which are absolutelyindifferent there can be no choice

and consequently no option or will.”

Gottfried Wilhelm von Leibniz