Top Banner
A more efficient Collaborative Filtering method Tam Ming Wai Dr. Nikos Mamoulis
66

A more efficient Collaborative Filtering method Tam Ming Wai Dr. Nikos Mamoulis.

Dec 14, 2015

Download

Documents

Albert Bell
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A more efficient Collaborative Filtering method Tam Ming Wai Dr. Nikos Mamoulis.

A more efficient Collaborative Filtering method

Tam Ming Wai

Dr. Nikos Mamoulis

Page 2: A more efficient Collaborative Filtering method Tam Ming Wai Dr. Nikos Mamoulis.

Outline

Introduction to Collaborative Filtering Special nature of CF Inverted File Search Algorithm Item-based Slope-one Hybrid method No random access Experiment

Page 3: A more efficient Collaborative Filtering method Tam Ming Wai Dr. Nikos Mamoulis.

Collaborative Filtering

Looking for opinions from similar taste friends

The active user collaborate to other users Trust those who are similar taste more

Page 4: A more efficient Collaborative Filtering method Tam Ming Wai Dr. Nikos Mamoulis.

Example

i1 i2 i3 i4 i5 iaua 1 2 3 4 5 ?

u1 1 2 3 4 5 5

u2 5 4 3 2 1 1

ua trust u1 more than u2

Page 5: A more efficient Collaborative Filtering method Tam Ming Wai Dr. Nikos Mamoulis.

Special nature of CF

Trust your feeling in the following a few slides

Page 6: A more efficient Collaborative Filtering method Tam Ming Wai Dr. Nikos Mamoulis.

Searching for similar users

Which user is the best one to trust in order to predict “?” ?

Everyone Only i2 is relevant

i1 i2 i3 i4 iaua - 2 - - ?

u1 - 2 - - 3

u2 1 2 - - 1

u3 - 2 2 - 4

u4 2 2 - 3 2

u5 1 2 2 1 4

Page 7: A more efficient Collaborative Filtering method Tam Ming Wai Dr. Nikos Mamoulis.

Similarity

The similarity is not based on all attributes (the items)

Only the items which the active user rated are relevant

Although some suggested (Breese al. et.) more items could be considered (by default voting), it is not popular.

Page 8: A more efficient Collaborative Filtering method Tam Ming Wai Dr. Nikos Mamoulis.

Searching for similar users

Which user is the best one to trust in order to predict “?” ?

Everyone except u5

i1 i2 i3 i4 iaua 1 2 3 5 ?

u1 1 - - - 3

u2 - 2 - - 1

u3 - - 3 - 4

u4 - - - 5 2

u5 - - - - 4

Page 9: A more efficient Collaborative Filtering method Tam Ming Wai Dr. Nikos Mamoulis.

Similarity

The similarity is not based on all attributes (the items)

Only the items which both the active user and the user under consideration rated are relevant

Page 10: A more efficient Collaborative Filtering method Tam Ming Wai Dr. Nikos Mamoulis.

A Notice

ua is similar to u1, u2, u3 and u4

BUT

u1, u2, u3 and u4 are totally not relevant to each other

Page 11: A more efficient Collaborative Filtering method Tam Ming Wai Dr. Nikos Mamoulis.

Searching for similar users

Which user is the best one to trust in order to predict “?” ?

u3 is the one.

Only u3 is relevant

i1 i2 i3 i4 iaua 1 2 3 4 ?

u1 1 2 3 5 -

u2 2 3 1 4 -

u3 4 3 2 1 4

u4 2 1 1 3 -

u5 1 4 2 1 -

Page 12: A more efficient Collaborative Filtering method Tam Ming Wai Dr. Nikos Mamoulis.

Top-k most similar users

It is not the top-k of among all users It is the top-k of among the users who

rated ia

Page 13: A more efficient Collaborative Filtering method Tam Ming Wai Dr. Nikos Mamoulis.

Summary on the nature

The matrix is incomplete Similarity

The set of items could be different for every pair of users (the intersect)

The set of users (the candidates) could be different for each query (those who rated ia)

No triangle inequality (in extreme, ua is similar to u1, u2; but u1 and u2 can be irrelevant)

Page 14: A more efficient Collaborative Filtering method Tam Ming Wai Dr. Nikos Mamoulis.

Popular Similarity measure

Very often, Pearson Correlation is used:

j iterate through the items that rated by both user i and user a

Vote (rating) on item j by user a Average vote (rating) of user a

Page 15: A more efficient Collaborative Filtering method Tam Ming Wai Dr. Nikos Mamoulis.

Output - Prediction

C is a set of users who Rated the queried Item

Page 16: A more efficient Collaborative Filtering method Tam Ming Wai Dr. Nikos Mamoulis.

Brute Force Searching

Given an active user and active movie:Relevant movies are known from the active

user profileCandidates are known from the active movie

profile

Find sim(ua, ui) for all ui in candidate set The top-k are used as advisors

Page 17: A more efficient Collaborative Filtering method Tam Ming Wai Dr. Nikos Mamoulis.

Useful Information

What are the useful information?

i1 i2 i3 i4 iaua 1 2 - 4 ?

u1 - 2 3 - 4

u2 2 3 1 - -

u3 4 - 2 1 4

u4 2 1 - 3 3

u5 1 - 2 1 -

Page 18: A more efficient Collaborative Filtering method Tam Ming Wai Dr. Nikos Mamoulis.

Useful Information

What are the useful information?

i1 i2 i3 i4 iaua 1 2 - 4 ?

u1 - 2 3 - 4

u2 2 3 1 - -

u3 4 - 2 1 4

u4 2 1 - 3 3

u5 1 - 2 1 -

Page 19: A more efficient Collaborative Filtering method Tam Ming Wai Dr. Nikos Mamoulis.

Useful Information

What are the useful information?

The Green entries are useful

i1 i2 i3 i4 iaua 1 2 - 4 ?

u1 - 2 3 - 4

u2 2 3 1 - -

u3 4 - 2 1 4

u4 2 1 - 3 3

u5 1 - 2 1 -

Page 20: A more efficient Collaborative Filtering method Tam Ming Wai Dr. Nikos Mamoulis.

Useful Information

All user profiles

or All movie profiles

Contains the useful information

i1 i2 i3 i4 iaua 1 2 - 4 ?

u1 - 2 3 - 4

u2 2 3 1 - -

u3 4 - 2 1 4

u4 2 1 - 3 3

u5 1 - 2 1 -

Page 21: A more efficient Collaborative Filtering method Tam Ming Wai Dr. Nikos Mamoulis.

Inverted file

Item

1 2 3 4 5 6

User

1 - 1 - 3 4 5

2 1 3 4 5 - 5

3 - 3 - 4 1 -

Item

1

2

3

4

5

6

2 1

1 3

2 4

1 4

1 5

2 5 3 4

1 1 2 3 3 3

3 1

2 5

Coster & Svensson 2002

Page 22: A more efficient Collaborative Filtering method Tam Ming Wai Dr. Nikos Mamoulis.

Pearson Correlation

The active user is fixed in a single query For each user i, there are 3 summations Instead of calculate the w(a,i) for each user i, calculate

SAI[i], SAA[i] and SII[i] for all users (with help of inverted list)

SAA[i]

SAI[i]

SII[i]

Page 23: A more efficient Collaborative Filtering method Tam Ming Wai Dr. Nikos Mamoulis.

Early Termination

Self-Indexing Inverted Files for Fast Text Retrieval, Alistair Moffat and Justin Zobel, 1994

QuitStop when number of user reaches a threshold

ContinueStop consider new users when number of user

reaches a threshold

Page 24: A more efficient Collaborative Filtering method Tam Ming Wai Dr. Nikos Mamoulis.

Item-based

The matrix is symmetric Exchange the role of row (user profile) and

column (movie profile) Looks for movies which are similar to the

active movie If the users act similarly to both movies,

the active user may act similarly too.

Page 25: A more efficient Collaborative Filtering method Tam Ming Wai Dr. Nikos Mamoulis.

Item-based example

The users act exactly the same on i2 and ia

Perhaps i2 and ia are very similar

? May be 1, as ua give i2 rating 1

i1 i2 i3 i4 iaua 1 1 3 4 ?

u1 1 1 3 5 1

u2 2 2 1 4 2

u3 4 4 2 1 4

u4 2 4 1 3 4

u5 1 5 2 1 5

Page 26: A more efficient Collaborative Filtering method Tam Ming Wai Dr. Nikos Mamoulis.

Sarwar et al 2001Pre-find top-k similar items

Amazon.comPersonal promotion on the top-k similar items

Page 27: A more efficient Collaborative Filtering method Tam Ming Wai Dr. Nikos Mamoulis.

Slope-one

Not only find similar items Measure the pattern between items Lemire & Maclachlan 2005

Page 28: A more efficient Collaborative Filtering method Tam Ming Wai Dr. Nikos Mamoulis.

Slope-one

For items pair j and i For all users rated both items Find the average difference in rating

Page 29: A more efficient Collaborative Filtering method Tam Ming Wai Dr. Nikos Mamoulis.

Slope-one

A prediction is made based on devj,i

Page 30: A more efficient Collaborative Filtering method Tam Ming Wai Dr. Nikos Mamoulis.

Slope-one example

All users gave ia higher rating than i3 by 1

By considering ia and i3, ua may rate ‘?’ as 4

i1 i2 i3 i4 iaua 1 4 3 4 ?

u1 1 2 1 5 2

u2 2 2 1 4 2

u3 4 4 3 1 4

u4 2 4 3 3 4

u5 1 5 4 1 5

Page 31: A more efficient Collaborative Filtering method Tam Ming Wai Dr. Nikos Mamoulis.

Summary

A common argumentThere are less items than users

Pre-computationSimilarity in item-baseddevj,i in slope-one

Page 32: A more efficient Collaborative Filtering method Tam Ming Wai Dr. Nikos Mamoulis.

Hybrid method

Finding top-k similar users Brute force

Inefficient when number of candidate is large Inverted file

Inefficient when number of relevant items is large

Mixing the 2

Page 33: A more efficient Collaborative Filtering method Tam Ming Wai Dr. Nikos Mamoulis.

Hybrid method

Inverted file again The files are segmented according to

ratings

Page 34: A more efficient Collaborative Filtering method Tam Ming Wai Dr. Nikos Mamoulis.

I1

Segmented inverted file example

All users here given I1 rating 5

All users here given I1 rating 4

All users here given I1 rating 3

All users here given I1 rating 2

All users here given I1 rating 1

Page 35: A more efficient Collaborative Filtering method Tam Ming Wai Dr. Nikos Mamoulis.

Accessing Segmented inverted file

First access the segments which is closer to the active user’s rating

Page 36: A more efficient Collaborative Filtering method Tam Ming Wai Dr. Nikos Mamoulis.

I1

Access example

Access order 1, d=0 ua here

Access order 2, d=1

Access order 3, d=1

Access order 4, d=2

Access order 5, d=3

Page 37: A more efficient Collaborative Filtering method Tam Ming Wai Dr. Nikos Mamoulis.

Accessing Segmented inverted file

The inverted file is a list ranked on d (distance to ua’s rating)

The best bound on similarity can be found

Page 38: A more efficient Collaborative Filtering method Tam Ming Wai Dr. Nikos Mamoulis.

Algorithm

phase 1Access all inverted lists, such that all d=0

segments are loadedStarting from the most frequently seen

candidates, find the actual similarity (totally k candidates are needed)

The similarity of the k th candidate who actual similarity is known will be the initial filter

Page 39: A more efficient Collaborative Filtering method Tam Ming Wai Dr. Nikos Mamoulis.

I1 I2 I3 I4 I5

Page 40: A more efficient Collaborative Filtering method Tam Ming Wai Dr. Nikos Mamoulis.

I1 I2 I3 I4 I5

Page 41: A more efficient Collaborative Filtering method Tam Ming Wai Dr. Nikos Mamoulis.

I1 I2 I3 I4 I5

Page 42: A more efficient Collaborative Filtering method Tam Ming Wai Dr. Nikos Mamoulis.

Algorithm phase 1 example

candidate actual similarity

u3 0.89

u8 0.88

… …

u1 0.77

u9 0.70

filter

K

Page 43: A more efficient Collaborative Filtering method Tam Ming Wai Dr. Nikos Mamoulis.

Algorithm

phase 2 – keep loading form the inverted lists The best bound of the similarity decreases Similarity bound is worse than filter => pruned The partial information is more complete Update filter after some number of segments are load Stop when number of remaining candidate is small

Page 44: A more efficient Collaborative Filtering method Tam Ming Wai Dr. Nikos Mamoulis.

Algorithm – phase 2

In the implementation, the items rated by ua extremely (close to 1 or 5) are loaded first

The candidates’ best bound drop faster

Page 45: A more efficient Collaborative Filtering method Tam Ming Wai Dr. Nikos Mamoulis.

I1 I2 I3 I4 I5

Page 46: A more efficient Collaborative Filtering method Tam Ming Wai Dr. Nikos Mamoulis.

I1 I2 I3 I4 I5

Page 47: A more efficient Collaborative Filtering method Tam Ming Wai Dr. Nikos Mamoulis.

I1 I2 I3 I4 I5

Page 48: A more efficient Collaborative Filtering method Tam Ming Wai Dr. Nikos Mamoulis.

I1 I2 I3 I4 I5

Page 49: A more efficient Collaborative Filtering method Tam Ming Wai Dr. Nikos Mamoulis.

I1 I2 I3 I4 I5

Page 50: A more efficient Collaborative Filtering method Tam Ming Wai Dr. Nikos Mamoulis.

I1 I2 I3 I4 I5

Page 51: A more efficient Collaborative Filtering method Tam Ming Wai Dr. Nikos Mamoulis.

Similarity measure

Additive L1 Segmental Manhattan Distance

= Manhattan Distance / # of relevant items

Sim=1-(SMD)/(maximum distance)

Page 52: A more efficient Collaborative Filtering method Tam Ming Wai Dr. Nikos Mamoulis.

Horting

To ensure the intersect of items is large enough

Aggarwal et al

Page 53: A more efficient Collaborative Filtering method Tam Ming Wai Dr. Nikos Mamoulis.

Horting

i1 i2 i3 i4 i5 iaua 1 2 3 4 5 ?

u1 1 2 3 4 5 5

u2 1 - - - - 1

Sim(ua, u1) = Sim(ua, u2)

u2 is less reliable

Page 54: A more efficient Collaborative Filtering method Tam Ming Wai Dr. Nikos Mamoulis.

Best bound

We have ‘user num of appearance’ ‘max num of more appearance’ = min(ua_profile.len, ui_profile.len) –

‘user num of appearance’

if never see this user in any segment best distance = 1 else if ( partial distance > 1 ) The user appear in unseen items, and d=1 else if (‘max num of more appearance’ < horting_factor) The user appear enough number of times only else The user does not appear anymore, partial distance is the

best

Page 55: A more efficient Collaborative Filtering method Tam Ming Wai Dr. Nikos Mamoulis.

No random access

The inverted file is a list ranked on d (distance to ua’s rating)

Nikos Mamoulis, Kit Hung Cheng, Man Lung Yiu, and David W. Cheung 2006

phase 1 Do not find any actual similarity until

The best bound of an unseen user isworse thanThe k th best worst bound

Page 56: A more efficient Collaborative Filtering method Tam Ming Wai Dr. Nikos Mamoulis.

I1 I2 I3 I4 I5

Page 57: A more efficient Collaborative Filtering method Tam Ming Wai Dr. Nikos Mamoulis.

I1 I2 I3 I4 I5

Page 58: A more efficient Collaborative Filtering method Tam Ming Wai Dr. Nikos Mamoulis.

I1 I2 I3 I4 I5

Page 59: A more efficient Collaborative Filtering method Tam Ming Wai Dr. Nikos Mamoulis.

I1 I2 I3 I4 I5

Page 60: A more efficient Collaborative Filtering method Tam Ming Wai Dr. Nikos Mamoulis.

I1 I2 I3 I4 I5

Page 61: A more efficient Collaborative Filtering method Tam Ming Wai Dr. Nikos Mamoulis.

Worst Bound

While a user’s partial distance is smaller than the maximum possible distance include the distance

Page 62: A more efficient Collaborative Filtering method Tam Ming Wai Dr. Nikos Mamoulis.

No random access

phase 2Find actual similarity and prune candidates

Page 63: A more efficient Collaborative Filtering method Tam Ming Wai Dr. Nikos Mamoulis.

Experiment

Netflix dataset 480189 users 17770 movies 100 million ratings (1.17%)

k = 50 h = 10

Page 64: A more efficient Collaborative Filtering method Tam Ming Wai Dr. Nikos Mamoulis.

Efficiency

Brute force 185.24s per query Hybrid 25.85s per query NRA 59.34s per query

Page 65: A more efficient Collaborative Filtering method Tam Ming Wai Dr. Nikos Mamoulis.

Disk IO statistic (hybrid)

% of actual similarity7.60%

% of entries loaded from inverted file68.52%

% of entries which loaded and relevant49.77%

Page 66: A more efficient Collaborative Filtering method Tam Ming Wai Dr. Nikos Mamoulis.

Reference

Breese et al Empirical Analysis of Predictive Algorithms for Collaborative Filtering

Coster & Svensson 2002 Inverted File Search Algorithms for Collaborative Filtering

Lemire & Maclachlan 2005 Slope One Predictors for Online Rating-Based Collaborative Filtering

Sarwar et al 2001 ItemBased Collaborative Filtering Recommendation Algorithms

Aggarwal et al Horting Hatches an Egg: A New Graph-Theoretic Approach to Collaborative

Filtering Nikos Mamoulis, Kit Hung Cheng, Man Lung Yiu, and David W. Cheung

2006 Efficient Aggregation of Ranked Inputs