CLiMF: Collaborative Less-is-More Filtering

1 RecSys 2012, Dublin, Ireland, September 13, 2012

CLiMF: Learning to Maximize Reciprocal Rank with

Collaborative Less-is-More Filtering

Yue Shia, Alexandros Karatzogloub (Presenter), Linas Baltrunasb, Martha Larsona, Alan Hanjalica, Nuria Oliverb

aDelft University of Technology, Netherlands

bTelefonica Research, Spain


Top-k Recommendations


Implicit feedback


Models for Implicit feedback

• Classification or Learning to Rank • Binary pairwise ranking loss function (Hinge, AUC loss) •  Sample from non-observed/irrelevant entries

Friend Not a Friend


• The Point-wise Approach •  Reduce Ranking to Regression, Classification, or Ordinal

Regression problem, [OrdRec@Recys 2011]

• The Pairwise Approach •  Reduce Ranking to pair-wise classification [BPR@UAI 2010]

•  List-wise Approach •  Direct optimization of IR measures, List-wise loss minimization [CoFiRank@NIPS 2008]

Learning to Rank in CF

f(user, item) → R

f(user, item1, item2) → R

f(user, item1, . . . , itemn) → R


List-wise ranking measures for implicit feedback: Reciprocal Rank: Average precision: [TFMAP@SIGIR2012]

Ranking metrics

AP =

|S|�

k=1

P (k)

|S|

RR =1

ranki


Ranking metrics

AP = 1

RR = 1


Ranking metrics

AP = 0.66

RR = 1


Less is more

•  Focus at the very top of the list

• Try to get at least one interesting item at the top of the list

• MRR particularly important measure in domains that usually provide users with only few recommendations, i.e. Top-3 or Top-5


Ingredients

• What kind of model should we use? •  Factor model

• Which Ranking measure do we need to optimize to have a good Top-k recommender? •  MRR captures the quality of Top-k recommendations

•  But MRR is not smooth so what can we do? •  We can perhaps find a smooth version of MRR

• How to ensure the proposed solution scalable? •  A fast learning algorithm (SGD), smoothness -> gradients


Model

+

+

+ +

+

+

+

+

+

fij = �Ui, Vj�


The Non-smoothness of Reciprocal Rank

• Reciprocal Rank (RR) of a ranked list of items for a given user

RRi =N�

j=1

Yij

Rij

N�

k=1

(1− YikI(Rik < Rij))


Non-smoothness

RR = 0.5 Fi

0.81

0.75

0.64

0.61

0.58

0.55

0.49

0.43

1

2

3

4

5

6

7

8


Non-smoothness

RR = 0.5 Fi

0.84

0.82

0.56

0.50

0.45

0.40

0.32

0.31

1

2

3

4

5

6

7

8


Non-smoothness

RR = 1 Fi

0.85

0.84

0.56

0.50

0.45

0.40

0.32

0.31

1

2

3

4

5

6

7

8



How can we get a smooth-MRR?

• Borrow techniques from learning-to-rank:

I(Rik < Rij) ≈ g(fik − fij)

g(x) = 1/(1 + e−x)

1

Rij≈ g(fij)


MRR Loss function

RRi ≈N�

j=1

Yijg(fij)N�

k=1

�1− Yikg(fik − fij)

�

Ui, V = argmaxUi,V

{RRi}

fij = �Ui, Vj�

O(N2)


MRR loss function II

• Use concavity and monotonicity of log function,

L(Ui, V ) =N�

j=1

Yij

�ln g(fij) +

N�

k=1

ln�1− Yikg(fik − fij)

��

O(n+2)


Optimization

• Objective is smooth we can compute:

• We use Stochastic Gradient Descent

• Overall scalability linear to # of relevant items

E(U, V ) =M�

i=1

N�

j=1

Yij

�ln g(UT

i Vj)

+N�

k=1

ln�1− Yikg(U

Ti Vk − UT

i Vj)��

− λ

2(�U�2 + �V �2)

∂E

∂Ui

∂E

∂Vj

O(dS)


What’s different ?

• CLiMF reciprocal rank loss essentially pushes relevant items apart

•  In the process at least one items ends up high-up in the list


Conventional loss


CLiMF MRR-loss


Experimental Evaluation Data sets

•  Epinions : •  346K observations of trust relationship •  1767 users; 49288 trustees •  99.85% Sparseness •  Avg. friends/trustees per user 73.34


Experimental Evaluation Data sets

• Tuenti : •  798K observations of trust relationship •  11392 users; 50000 friends •  99.86% Sparseness •  Avg. friends/trustees per user 70.06


Experimental Evaluation Experimental Protocol

Training data

Test data

(Items holdout)

data

Given 5

Friends/Trustees

Friends/Trustees



Training data

Test data

(Items holdout)

data

Given 10

Friends/Trustees

Friends/Trustees



Training data

Test data

(Items holdout)

data

Given 15

Friends/Trustees

Friends/Trustees



Training data

Test data

(Items holdout)

data

Given 20

Friends/Trustees

Friends/Trustees


Experimental Evaluation Evaluation Metrics

MRR =1

|S|

|S|�

i=1

1

ranki

ratio of test users who have at least one relevant item in their

Top-5

1− call@5

P@5


Experimental Evaluation Scalability


Experimental Evaluation Competition

•  Pop: Naive, recommend based on popularity of each item

•  iMF (Hu and Koren, ICDM’08): Optimizes Squared error loss

•  BPR-MF (Rendle et al., UAI’09): Optimizes AUC


5 10 15 20

Pop iMF BPR−MF CLiMF

Epinions MRR

MRR

0.0

0.1

0.2

0.3

0.4


5 10 15 20


Epinions P@5

P@5

0.00

0.05

0.10

0.15

0.20

0.25

0.30


5 10 15 20


Epinions 1−call@5

1−call@

5

0.0

0.2

0.4

0.6

0.8


5 10 15 20


Tuenti MRR

MRR

0.00

0.05

0.10

0.15

0.20


5 10 15 20


Tuenti P@5

P@5

0.00

0.02

0.04

0.06

0.08

0.10


5 10 15 20


Tuenti 1−call@5

1−call@

5

0.00

0.05

0.10

0.15

0.20

0.25

0.30


Conclusions and Future Work

• Contribution •  Novel method for implicit data with some nice properties (Top-k, speed)

•  Future work •  Use CLiMF to avoid duplicate or very similar recommendations in

the top-k part of the list •  To optimize other evaluation metrics for top-k recommendation

•  To take the social network of users into account


Thank you !

Telefonica Research is looking for interns! Contact: [email protected] or [email protected]

CLiMF: Collaborative Less-is-More Filtering

Technology

list recsys

friend recsys

spain recsys

ex recsys

v2on recsys

climf mrrloss recsys

smoothness gradients

relevant itemsods recsys