Top Banner
Online recommendations using matrix factorisation Marcus Ljungblad [email protected] Royal Institute of Technology, Stockholm, Sweden Instituto Superior Técnico, Lisbon, Portugal Universitat Politécnica de Catalunya, Barcelona, Spain Thesis presentation
36

Thesis-presentation: Tuenti Engineering

Dec 04, 2014

Download

 
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Thesis-presentation: Tuenti Engineering

Online recommendations using matrix factorisation

Marcus [email protected]

Royal Institute of Technology, Stockholm, SwedenInstituto Superior Técnico, Lisbon, Portugal

Universitat Politécnica de Catalunya, Barcelona, Spain

Thesis presentation

Page 2: Thesis-presentation: Tuenti Engineering

40+ million videos

13+ million users

500 requests/second

306 years

Page 3: Thesis-presentation: Tuenti Engineering

3 reasons:- find good content- improve user experience- increase revenue

great!

Page 4: Thesis-presentation: Tuenti Engineering
Page 5: Thesis-presentation: Tuenti Engineering

3 problems

Page 6: Thesis-presentation: Tuenti Engineering

1: the data

Page 7: Thesis-presentation: Tuenti Engineering

2: the model

Page 8: Thesis-presentation: Tuenti Engineering

2: the model

Page 9: Thesis-presentation: Tuenti Engineering

Why so little

systems research?

3: the system

Page 10: Thesis-presentation: Tuenti Engineering

3: the system

Page 11: Thesis-presentation: Tuenti Engineering

3 1 problem

Page 12: Thesis-presentation: Tuenti Engineering

How do you serve recommendations from millions of items to millions of users?

Question:

Page 13: Thesis-presentation: Tuenti Engineering
Page 14: Thesis-presentation: Tuenti Engineering

2 4 4 ? 13 5 ? ? 1? 4 2 1 ?1 ? 1 3 3

Use

rsVideo ratings

Page 15: Thesis-presentation: Tuenti Engineering

def matrix_factorization(MatrixToFactorise, UsersPreferences, MoviesFeatures, NumberOfLatentFeatures, MaxSteps=5000, LearningRate=0.0002, RegularizationConstant=0.02): MoviesFeatures = MoviesFeatures.T for step in xrange(MaxSteps): for user in xrange(len(MatrixToFactorise)): for movie in xrange(len(MatrixToFactorise[user])): if MatrixToFactorise[user][movie] > 0: estimatedUserMovieFactors = MatrixToFactorise[user][movie] - \ numpy.dot(UsersPreferences[user,:], MoviesFeatures[:,movie]) for feature in xrange(NumberOfLatentFeatures): UsersPreferences[user][feature] = UsersPreferences[user][feature] + \ LearningRate * (2 * estimatedUserMovieFactors * MoviesFeatures[feature][movie] - RegularizationConstant * UsersPreferences[user][feature]) MoviesFeatures[feature][movie] = MoviesFeatures[feature][movie] + \ LearningRate * (2 * estimatedUserMovieFactors * UsersPreferences[user][feature] - RegularizationConstant * MoviesFeatures[feature][movie]) # if approximation is good enough, stop iterating ApproximationError = calculate_mean_squared_error_of_estimate(MatrixToFactorise, UsersPreferences, MoviesFeatures, NumberOfLatentFeatures, RegularizationConstant) if ApproximationError < 0.001: break

Sorry about

the slide

Page 16: Thesis-presentation: Tuenti Engineering

[ 1.52 -0.07 0.66 0.76 0.79] [ 0.79 0.63 0.08 0.9 1.46] [ 0.56 0.58 0.16 0.43 1.28] [-0.15 0.7 0.87 1.45 -0.3]

[ 0.38 0.91 0.32 0.36 1.22] [ 0.72 0.98 0.98 1.28 1.75] [ 1.54 -0.19 0.81 0.61 0.72] [ 0.22 0.61 0.95 1.18 -0.09] [-0.13 0.76 0.97 1.04 -0.26]

Page 17: Thesis-presentation: Tuenti Engineering

[ 2.05 3.97 3.96 2.12 1.01] [ 2.93 5.02 3.21 1.61 0.98] [ 2.15 3.95 2.01 1.05 1.1 ] [ 1. 4.29 1.01 2.96 2.98]

Page 18: Thesis-presentation: Tuenti Engineering

[ 2.05 3.97 3.96 2.12 1.01] [ 2.93 5.02 3.21 1.61 0.98] [ 2.15 3.95 2.01 1.05 1.1 ] [ 1. 4.29 1.01 2.96 2.98]

2 4 4 ? 13 5 ? ? 1? 4 2 1 ?1 ? 1 3 3

Page 19: Thesis-presentation: Tuenti Engineering

[ 2.05 3.97 3.96 2.12 1.01] [ 2.93 5.02 3.21 1.61 0.98] [ 2.15 3.95 2.01 1.05 1.1 ] [ 1. 4.29 1.01 2.96 2.98]

Page 20: Thesis-presentation: Tuenti Engineering

1 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 11

13x40 million ratings

Page 21: Thesis-presentation: Tuenti Engineering

Clustering

Page 22: Thesis-presentation: Tuenti Engineering

12

3

Millions of items

Page 23: Thesis-presentation: Tuenti Engineering

12

3

Page 24: Thesis-presentation: Tuenti Engineering

12

3

Recommendation Request

Page 25: Thesis-presentation: Tuenti Engineering

12

3

Recommendation Request

Page 26: Thesis-presentation: Tuenti Engineering

Compass = last video

Page 27: Thesis-presentation: Tuenti Engineering

Interface Delegate Router

WorkersWorkers

Workers

merge / sort

start

request

routecompute

reply top N

top N to jsonstart

Page 28: Thesis-presentation: Tuenti Engineering

Did it work?

Page 29: Thesis-presentation: Tuenti Engineering

Results- ~600 requests per second- latency below 30 ms- quality is ok

Page 30: Thesis-presentation: Tuenti Engineering

Results: Throughput

Page 31: Thesis-presentation: Tuenti Engineering

Results: Throughput

huh?

Page 32: Thesis-presentation: Tuenti Engineering

Interface Delegate Router

WorkersWorkers

Workers

merge / sort

start

request

routecompute

reply top N

top N to jsonstart

Page 33: Thesis-presentation: Tuenti Engineering

Results: Quality

Queries Non-zero MAP

1 41 23%

2 87 25%

3 116 36%

4 165 58%

5 196 74%

Page 34: Thesis-presentation: Tuenti Engineering

Summary- clustering is data- balanced clusters needed- scale is ok

Page 35: Thesis-presentation: Tuenti Engineering

?

Page 36: Thesis-presentation: Tuenti Engineering

Photos and imagery used in the presentation (except graphs and logos). Amazon recommendations: http://pleated-jeans.com/2010/08/06/amazon-recommendations-for-characters-from-the-office/Pile of books: http://www.paper-pills.com/category/gewgaws/page/2/Function: http://en.wikipedia.org/wiki/File:Graph_of_example_function.svgServer: http://arstechnica.com/gadgets/2007/08/windows-home-server-system-specs-prices-and-launch-date-leaked/Tick: http://ia.wikipedia.org/wiki/File:Tick_green_modern.svg Phone: http://www.foxbusiness.com/technology/2012/05/22/are-carrier-subsidies-hurting-innovation-and-driving-up-mobile-phone-costs/Man in front of computer: http://honesttogawd.blogspot.com.es/