Collaborative Filtering Recommendation Techniquessaharon/StatsLearn2011/Yehuda.pdf• Oct’06: Contest began • Oct’07: $50K progress prize for 8.43% improvement • Oct’08:

Collaborative Filtering Recommendation Techniques

Yehuda Koren

Recommendation Types

Editorial Simple aggregates Top 10, Most Popular, Recent Uploads

Tailored to individual users Books, CDs, other products at amazon.com Movies by Netflix, MovieLens TV Shows by TiVo …

3

Recommendation Process

Collecting “known” user-item ratings Extrapolate unknown ratings from

known ratings Estimate ratings for the items that have not

been seen by a user Recommend the items with the highest

estimated ratings to a user

Collaborative Filtering

Collaborative filtering• Recommend items based on past transactions of many

users• Analyze relations between users and/or items• Specific data characteristics are irrelevant

– Domain-free: user/item attributes are not necessary– Can identify elusive aspects

http://www.amazon.com/ref=topnav_gw_gw/105-4928948-8451605�

Research

Research

“We’re quite curious, really. To the tune of one million dollars.” – Netflix Prize rules

• Goal to improve on Netflix’ existing movie recommendation technology, Cinematch

• Criterion: reduction in root mean squared error (RMSE)• Oct’06: Contest began • Oct’07: $50K progress prize for 8.43% improvement• Oct’08: $50K progress prize for 9.44% improvement• Sept’09: $1 million grand prize for 10.06% improvement

Research

scoremovieuser1211521314345241232376825763445415685234252234557664566

movieuser?621?961?72?32?473?153?414?284?935?745?696?836

Training data Test data

Movie rating data

• Training data– 100 million

ratings– 480,000 users– 17,770 movies– 6 years of data:

2000-2005• Test data

– Last few ratings of each user (2.8 million)

• Dates of ratings are given

Test Data Split into Three Pieces

• Probe– Ratings released– Allows participants to assess

methods directly

• Daily submissions allowed for combined Quiz/Test data– RMSE released for Quiz– Prizes based on Test RMSE– Identity of Quiz cases

withheld– Test RMSE withheld

Training Data

Training Data Hold-out Set(last 9 ratings for each user:

4.2M pairs)

All Data(~ 103 M user-item pairs)

Quiz Test

Labels provided Labels retained by Netflix for scoring

Random 3-way split

Probe

Research

#ratings per user

• Avg #ratings/user: 208

Research11

Most Active Users

User ID # Ratings Mean Rating305344 17,651 1.90387418 17,432 1.81

2439493 16,560 1.221664010 15,811 4.262118461 14,829 4.081461435 9,820 1.371639792 9,764 1.331314869 9,739 2.95

Research

#ratings per movie

• Avg #ratings/movie: 5627

Research

Movies Rated Most Often

Title # Ratings Mean RatingMiss Congeniality 227,715 3.36Independence Day 216,233 3.72The Patriot 200,490 3.78The Day After Tomorrow 194,695 3.44Pretty Woman 190,320 3.90Pirates of the Caribbean 188,849 4.15The Green Mile 180,883 4.31Forrest Gump 180,736 4.30

Research

Important RMSEs

Prize’07 (BellKor): 0.8712

Cinematch: 0.9514; baseline

Movie average: 1.0533

User average: 1.0651

Global average: 1.1296

Inherent noise: ????

Personalization

erroneous

accurate

Prize’08 (BellKor+BigChaos): 0.8616

Grand Prize (BellKor’s Pragmatic Chaos) : 0.8554

Random ranking

Bestsellers list

Problem Definition

1 4 3

4

4 4

4

2

Research

#17,770

#3

#2

#1

Users➔Items #480,000#3#2#1

Users/items arranged in ratings space

Latent factorization algorithm

• Dim(Users)≠ Dim(Items) (E.g., 17,770-vs-480,000)• Sparse data, with non-uniformly missing entries

Latent factor methods

Research

0.70.950.371.41.91.4User-480K

1.11.871.20.880.130.19User-3

1.951.050.970.041.10.77User-2

1.30.671.20.370.490.08User-1

0.170.870.760.430.120.44Item-17,770

1.10.550.370.12.10.95Item-3

0.371.251.351.190.011.1Item-2

0.250.011.72.10.81.2Item-1

Latent factorization algorithm

Users/items arranged in joint dense latent factors space

Research

Geared towards females

Geared towards males

serious

escapist

The PrincessDiaries

The Lion King

Braveheart

Lethal Weapon

Independence Day

AmadeusThe Color Purple

Dumb and Dumber

Ocean’s 11

Sense and Sensibility

Gus

Dave

A 2-D factor space

Research

Basic matrix factorization model

45531

312445

53432142

24542

522434

42331

items

.2-.4.1

.5.6-.5

.5.3-.2

.32.11.1

-22.1-.7

.3.7-1

-.92.41.4.3-.4.8-.5-2.5.3-.21.1

1.3-.11.2-.72.91.4-1.31.4.5.7-.8

.1-.6.7.8.4-.3.92.41.7.6-.42.1

~

~

items

users

users

A rank-3 SVD approximation

Research

Estimate unknown ratings as inner-products of factors:

45531

312445

53432142

24542

522434

42331

items

.2-.4.1

.5.6-.5

.5.3-.2

.32.11.1

-22.1-.7

.3.7-1

-.92.41.4.3-.4.8-.5-2.5.3-.21.1

1.3-.11.2-.72.91.4-1.31.4.5.7-.8

.1-.6.7.8.4-.3.92.41.7.6-.42.1

~

~

items

users


users

?

Research


45531

312445

53432142

24542

522434

42331

items

.2-.4.1

.5.6-.5

.5.3-.2

.32.11.1

-22.1-.7

.3.7-1

-.92.41.4.3-.4.8-.5-2.5.3-.21.1

1.3-.11.2-.72.91.4-1.31.4.5.7-.8

.1-.6.7.8.4-.3.92.41.7.6-.42.1

~

~

items

users


users

?

Research


45531

312445

53432142

24542

522434

42331

items

.2-.4.1

.5.6-.5

.5.3-.2

.32.11.1

-22.1-.7

.3.7-1

-.92.41.4.3-.4.8-.5-2.5.3-.21.1

1.3-.11.2-.72.91.4-1.31.4.5.7-.8

.1-.6.7.8.4-.3.92.41.7.6-.42.1

~

~

items

users

2.4


users

Matrix factorization model45531

312445

53432142

24542

522434

42331

.2-.4.1

.5.6-.5

.5.3-.2

.32.11.1

-22.1-.7

.3.7-1

-.92.41.4.3-.4.8-.5-2.5.3-.21.1

1.3-.11.2-.72.91.4-1.31.4.5.7-.8

.1-.6.7.8.4-.3.92.41.7.6-.42.1~

Idea: • Approximate the rating matrix as the product of two

lower-rank matrices: R=PQProperties:• SVD isn’t defined when entries are unknown use

specialized methods• Very powerful model can easily overfit, sensitive to

regularization

A regularized model

• User factors:Model a user u as a vector pu ~ Nk(µ, Σ)

• Movie factors:Model a movie i as a vector qi ~ Nk(γ, Λ)

• Ratings:Measure “agreement” between u and i: rui ~ N(pu

Tqi, ε2)• Simplifying assumptions:

µ =γ = 0, Σ= Λ = λI

Research

Matrix factorization as a cost function

ˆ Tui u ir p q=

regularization- user-factor of u

- item-factor of i

- rating by u for iuiriqup

• Optimize by either stochastic gradient-descent or alternating least squares

prediction

( ) ( )* *

2 2 2,

knownMin

ui

Tp q ui u i u i

rr p q p qλ− + +∑

Rating prediction:

Stochastic gradient descent optimization

• For each training example rui :– Compute prediction error: eui = rui – pu

Tqi

– Update item factor: qi qi+γ(pueui-λqi) – Update user factor: pu pu+γ(qieui-λpu)

Perform till convergence:

• Two constants to tune: γ (step size) and λ (regularization) • Cross validation: find values that minimize error on test set

Notation: - rating by user u to item iuir

Research

1 4 3

4

4 4

4

2

1.4

-0.2

0.8

0.5

-1.3

-0.4 1.6

-0.1 0.5

0.3

1.2 -0.51.1 -0.4

1.2 0.9

0.4 -0.4

1.2 -0.3

1.3

-0.1

0.9

0.4

1.1 -0.2

1.5

0.0

1.1 0.8

-1.2

-0.3

1.2 0.9

1.6

0.11.5

0.0

0.5 -0.3

-1.1

-0.2

0.4 -0.20.5 -0.1

0.6

0.2

P

Q

R

By Domonkos Tikk and the Gravity Team

http://www.amazon.com/gp/product/images/B001OQCV2E/ref=dp_image_z_0?ie=UTF8&n=130&s=dvd�

http://www.amazon.com/gp/product/images/B0019LY5IM/ref=dp_image_0?ie=UTF8&n=130&s=dvd�

http://www.amazon.com/gp/product/images/B001NXBRJG/sr=1-33/qid=1264498798/ref=dp_image_z_0?ie=UTF8&n=130&s=dvd&qid=1264498798&sr=1-33�

http://www.amazon.com/gp/product/images/B0009XRZ92/sr=1-16/qid=1264498781/ref=dp_image_0?ie=UTF8&n=130&s=dvd&qid=1264498781&sr=1-16�

http://www.clker.com/clipart-10945.html�

Research

Research

1 4 3

4

4 4

4

2

1.5

-1.0

2.1

0.8

1.0

1.6 1.8

0.7 1.6

0.0

1.4 1.1

0.9 1.9

2.5 -0.3

P

Q

R3.3 2.4

-0.5 3.5 1.5

1.14.9

http://www.clker.com/clipart-10945.html�

http://www.amazon.com/gp/product/images/B001OQCV2E/ref=dp_image_z_0?ie=UTF8&n=130&s=dvd�

http://www.amazon.com/gp/product/images/B0019LY5IM/ref=dp_image_0?ie=UTF8&n=130&s=dvd�

http://www.amazon.com/gp/product/images/B001NXBRJG/sr=1-33/qid=1264498798/ref=dp_image_z_0?ie=UTF8&n=130&s=dvd&qid=1264498798&sr=1-33�

http://www.amazon.com/gp/product/images/B0009XRZ92/sr=1-16/qid=1264498781/ref=dp_image_0?ie=UTF8&n=130&s=dvd&qid=1264498781&sr=1-16�

Research

Data normalization

• Most variability in observed data is driven by user-specific and item-specific effects, regardless of user-item interaction

• Examples:– Some movies are systematically rated higher– Some movies were rated by users that tend to rate low – Ratings change along time

• Data must be adjusted to account for these main effects• This stage requires most insights into the nature of the data• Can make a big difference…

Research

Components of a rating predictor

user-item interactionitem biasuser bias

User-item interaction• Characterizes the match

between users and items• Attracts most research in the

field• Benefits from algorithmic and

mathematical innovations

Biases• Separates users and movies• Often overlooked• Benefits from insights into

users’ behavior

ub + ib + Tu ip quir =

Research

A bias estimator

• We have expectations on the rating by user u to item i, even without estimating u’s attitude towards items like i

– Rating scale of user u– Values of other ratings the

user gave recently

– (Recent) popularity of item i– Selection bias

Research

Biases: an example

• Mean rating: 3.7 stars• The Sixth Sense is 0.5 stars above avg• Joe rates 0.2 stars below avgBaseline estimation:

Joe will rate The Sixth Sense 4 stars

Research

Biases 33%

Personalization 10%

Unexplained57%

Sources of Variance in Netflix data

1.276 (total variance)

0.732 (unexplained)0.415 (biases)0.129 (personalization)

++

Biases matter!

Research

Exploring Temporal Effects

2004

Netflix ratings by date

Something Happened in Early 2004…

Research

Are movies getting better with time?

Research

Multiple sources of temporal dynamics

• Item-side effects:– Product perception and popularity are constantly changing– Seasonal patterns influence items’ popularity

• User-side effects:– Customers redefine their taste– Transient, short-term bias; anchoring– Drifting rating scale– Change of rater within household

Research

Introducing temporal dynamics into biases

• Biases tend to capture most pronounced aspects of temporal dynamic

• We observe changes in:1. Rating scale of individual users (user bias)2. Popularity of individual items (item bias)

( )ˆ (( )) Tu i i uui b b ptt tr q= + +

ˆ Tui u i u ir b b p q= + +

Add temporal dynamics

General Lessons and Experience

• 1.4M predictions are split into 10 equal bins based on #ratings per user

Some users are more predictable…

0.7

0.75

0.8

0.85

0.9

0.95

1

12 24.4 38.1 55.4 80.9 119.4 176.5 264.8 420.9 918.5

RMSE vs. #Ratings

40

6090

12818050

100200

50100

20050

100 200 500

100200 500

50100 200 500 1000 1500

0.875

0.88

0.885

0.89

0.895

0.9

0.905

0.91

10 100 1000 10000 100000

RMSE

Millions of Parameters

Factor models: Error vs. #parameters

NMF

BiasSVD

SVD++

SVD v.2

SVD v.3

SVD v.4

Prize: 0.8563

Netflix: 0.9514

Research

Ratings are not given at random!

Marlin, Zemel, Roweis, Slaney, “Collaborative Filtering and the Missing at Random Assumption” UAI 2007

Yahoo! survey answersYahoo! music ratingsNetflix ratings

Distribution of ratings

Research

• A powerful source of information:Characterize users by which movies they rated, rather than how they rated

• A dense binary representation of the data:

45531

312445

53432142

24542

522434

42331

users

movies

010100100101

111001001100

011101011011

010010010110

110000111100

010010010101

usersm

ovies

Which movies users rate?

{ } ,ui u iR r= { } ,ui u i

B b=

Research

The Wisdom of Crowds (of Models)

• All models are wrong; some are useful – G. Box– Some miss strong “local” relationships, e.g., among sequels– Others miss cumulative effect of many small signals– Each complements the other

• Our best entry during Year 1 was a linear combination of 107 sets of predictions

• Our final solution was a linear blend of over 700 prediction sets– Many variations of model structure and parameter settings

• Mega blends are not needed in practice– A handful of simple models achieves 90% of the improvement of

the full blend

Research

0.865

0.867

0.869

0.871

0.873

0.875

0.877

0.879

0.881

1 8 15 22 29 36 43 50 57

Erro

r -R

MSE

#Predictors

Effect of ensemble size

Research

Talk announcement

Save the dateMay 24, 11:00AM @ Schreiber

Speaker:Prof. Ricardo Baeza-Yates, Yahoo! VP of Research for Europe and Latin America,

Research

45531

312445

53432142

24542

522434

42331

34321

454

2434

325

24

Yehuda KorenYahoo! Research

[email protected] homepage:

www.research.att.com/~volinsky/netflix/

mailto:[email protected]�

Collaborative Filtering Recommendation Techniquessaharon/StatsLearn2011/Yehuda.pdf• Oct’06: Contest began • Oct’07: $50K progress prize for 8.43% improvement • Oct’08:

Documents