SAC TRECK 2008

the effect of correlation coefficients on

communities of recommenders

neal lathia, stephen hailes, licia capradepartment of computer science

university college london

[email protected]

ACM SAC TRECK, Fortaleza, Brazil: March 2008Trust, Recommendations, Evidence and other Collaboration

Know-how

recommender systems:

built on collaboration between users

collaborative filtering research design

methodsto solve problems

1. accuracy, coverage

2. data sparsity, cold-start

3. incorporating tag knowledge

for example,

… a method to classify content correctly

data predictedratingsintelligent

process

our focus: k-nearest neighbours (kNN)

how do we model kNN collaborative filtering?

a graph of cooperating users

me

nodes = userslinks = weighted according to similarity

accuracy, coverage

to answer this question, we need to find the optimal weighting:

the best similarity measure for the dataset, from the many available:

ba

ba

baRR

RRw ,

2

,

2

,

,,,

bibaia

bibaiaba

rrrr

rrrrw

2

,

2

,

,,,

1

bibaia

bibaiaba

rrrr

rrrr

Nw

and there are more still…

2

,2

,

,,,

5.25.2

5.25.2

ibia

ibiaba

rr

rrw

concordance: proportion of agreement

TN

DCw ba

,

+0.5 +3.0

-1.5+1.5

+1.5 +/-?

concordant

discordant

tied

Somers’ d}

community view of the graph:

-0.430.57

(a very small example)

me-0.50

-0.65

0.12

0.87

0.010.57

0.840.220.99

0.82

0.23

0.39

0.11

0.68

0.02

0.41 0.01

-0.99

0.78

or, put another way:

-0.430.57


me

good

bad

none

good

good

goodgood

none

nonegood

bad

bad

good

good

good

good

nonegood

good

what is the best way of generating the graph?

like this?

-0.430.57


me

good

bad

none

none

good

badbad

good

goodgood

good

good

bad

none

none

good

nonebad

bad

or like this?

-0.430.57


megood

bad

none

good

good

good

good

none

nonebad

bad

bad

good

good

good

good

none

good

good

similarity values depend on the method used:

there is no agreement between measures

[2][3][1][5][3]

[4][1][3][2][3]

my profile neighbour profile

pearson -0.50weighted- pearson -0.05cosine angle0.76co-rated proportion1.00concordance -0.06

badnear zero

goodvery goodnear zero

nodes = userslinks = weighted according to similarity

each method will change the distribution of similarity across the graph

… the pearson distribution

intelligent process

Pearson Distribution

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

(-1.0

,-0.9

5)

(-0.9

,-0.8

5)

(-0.8

,-0.7

5)

(-0.7

,-0.6

5)

(-0.6

,-0.5

5)

(-0.5

,-0.4

5)

(-0.4

,-0.3

5)

(-0.3

,-0.2

5)

(-0.2

,-0.1

5)

(-0.1

,-0.0

5)

(0.0,

0.05

)

(0.1,

0.15

)

(0.2,

0.25

)

(0.3,

0.35

)

(0.4,

0.45

)

(0.5,

0.55

)

(0.6,

0.65

)

(0.7,

0.75

)

(0.8,

0.85

)

(0.9,

0.95

)

Range

Pro

po

rtio

n

… the modified pearson distributionsweighted-PCC, constrained-PCC

Modified Pearson Distributions

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

(-1.0

,-0.9

5)

(-0.9

,-0.8

5)

(-0.8

,-0.7

5)

(-0.7

,-0.6

5)

(-0.6

,-0.5

5)

(-0.5

,-0.4

5)

(-0.4

,-0.3

5)

(-0.3

,-0.2

5)

(-0.2

,-0.1

5)

(-0.1

,-0.0

5)

(0.0,

0.05

)

(0.1,

0.15

)

(0.2,

0.25

)

(0.3,

0.35

)

(0.4,

0.45

)

(0.5,

0.55

)

(0.6,

0.65

)

(0.7,

0.75

)

(0.8,

0.85

)

(0.9,

0.95

)

Range

Pro

po

rtio

n

Weighted-PCC Constrained-PCC

… and other measures

intelligent process

Other Distributions

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

(-1.0

,-0.9

5)

(-0.9

,-0.8

5)

(-0.8

,-0.7

5)

(-0.7

,-0.6

5)

(-0.6

,-0.5

5)

(-0.5

,-0.4

5)

(-0.4

,-0.3

5)

(-0.3

,-0.2

5)

(-0.2

,-0.1

5)

(-0.1

5,-0

.1)

(-0.0

5,0.0

)

(0.05

,0.1)

(0.15

,0.2)

(0.25

,0.3)

(0.35

,0.4)

(0.45

,0.5)

(0.55

,0.6)

(0.65

,0.7)

(0.75

,0.8)

(0.85

,0.9)

(0.95

,1.0)

Range

Pro

po

rtio

n

Co-Rated Somers VSS

somers’ d, co-rated, cosine angle

an experiment withrandom numbers

what happens if we do this?

me

java.util.Random r = new java.util.Random()

for all neighbours i {

similarity(i) = (r.nextDouble()*2.0)-1.0);

}

Neighborhood Co Rated Somers’ d PCC wPCC R(0.5, 1.0) Constant(1.0) R(-1.0, 1.0)

1 0.9449 0.9492 1.1150 0.9596 1.0665 1.0406 1.0341

10 0.8498 0.8355 1.0455 0.8277 0.9595 0.9495 0.9689

30 0.7979 0.7931 0.9464 0.7847 0.8903 0.9108 0.8848

50 0.7852 0.7817 0.9007 0.7733 0.8584 0.8922 0.8498

100 0.7759 0.7728 0.8136 0.7647 0.8222 0.8511 0.8153

153 0.7726 0.7727 0.7817 0.7638 0.8053 0.8243 0.8024

229 0.7717 0.7771 0.7716 0.7679 0.7919 0.7992 0.8058

459 0.7718 0.7992 0.8073 0.8025 0.7773 0.7769 0.7811

N

prMAE

iaia ,,accuracy

…cross-validation results in paper

movielens u1 subset…

sprediction#

sprediction uncovered#Coveragecoverage

…cross-validation results in paper

movielens u1 subset…

Neighborhood Co Rated Somers’ d PCC wPCC Oracle

1 0.67795 0.57165 0.96725 0.61375 0.00495

10 0.15455 0.0999 0.80515 0.1114 0.00495

30 0.0512 0.0407 0.57225 0.04135 0.00495

50 0.03065 0.0266 0.3641 0.0251 0.00495

100 0.01515 0.01645 0.08345 0.01485 0.00495

153 0.00945 0.0122 0.0273 0.01135 0.00495

229 0.00715 0.00965 0.01165 0.00915 0.00495

459 0.00495 0.0054 0.00495 0.00495 0.00495

(best coverage when all of community used)

why do we get these results?

a) our error measures are not good

enough?

N

rpMAE

iaia ,,

sprediction#

sprediction uncovered#Coverage

J. Herlocker, J. Konstan, L. Terveen, and J. Riedl. Evaluating collaborative filtering recommender systems. In ACM Transactions on Information Systems, volume 22, pages 5–53. ACM Press, 2004.

S.M. McNee, J. Riedl, and J.A. Konstan. Being accurate is not enough: How accuracy metrics have hurt recommender systems. In Extended Abstracts of the 2006 ACM Conference on Human Factors in Computing Systems. ACM Press, 2006.

N

prRMSE iaia

2

,,

b) is there something wrong with the dataset?

c) is user-similarity not strong enough to capture the best recommender relationships in

the graph?

one proposal…

N. Lathia, S. Hailes, L. Capra. Trust-Based Collaborative Filtering. To appear In IFIPTM 2008: Joint iTrust and PST Conferences on Privacy, Trust management and Security. Trondheim, Norway. June 2008.

is modelling filtering as a trust-management problem a potential solution?

once we do that, more questions arise…

what other graph properties emerge from kNN collaborative filtering?

how does the graph evolve over time?

current work

N. Lathia, S. Hailes, L. Capra. Evolving Communities of Recommenders: A Temporal Evaluation. Research Note RN/08/01, Department of Computer Science, University College London. Under Submission.

N. Lathia, S. Hailes, L. Capra. kNN User Filtering: A Temporal Implicit Social Network. Current Work.

read more: http://mobblog.cs.ucl.ac.uktrust, recommendations, …

neal lathia, stephen hailes, licia capradepartment of computer science

university college london

[email protected]

ACM SAC TRECK, Fortaleza, Brazil: March 2008Trust, Recommendations, Evidence and other Collaboration Know-how

questions?

http://mobblog.cs.ucl.ac.uk/

SAC TRECK 2008

Technology

knn collaborative filtering

graph nodes

small example

knn user filtering

acm press

graph of cooperating

weighted pearson

graph properties