Page 1
Online recommendations using matrix factorisation
Marcus [email protected]
Royal Institute of Technology, Stockholm, SwedenInstituto Superior Técnico, Lisbon, Portugal
Universitat Politécnica de Catalunya, Barcelona, Spain
Thesis presentation
Page 2
40+ million videos
13+ million users
500 requests/second
306 years
Page 3
3 reasons:- find good content- improve user experience- increase revenue
great!
Page 9
Why so little
systems research?
3: the system
Page 12
How do you serve recommendations from millions of items to millions of users?
Question:
Page 14
2 4 4 ? 13 5 ? ? 1? 4 2 1 ?1 ? 1 3 3
Use
rsVideo ratings
Page 15
def matrix_factorization(MatrixToFactorise, UsersPreferences, MoviesFeatures, NumberOfLatentFeatures, MaxSteps=5000, LearningRate=0.0002, RegularizationConstant=0.02): MoviesFeatures = MoviesFeatures.T for step in xrange(MaxSteps): for user in xrange(len(MatrixToFactorise)): for movie in xrange(len(MatrixToFactorise[user])): if MatrixToFactorise[user][movie] > 0: estimatedUserMovieFactors = MatrixToFactorise[user][movie] - \ numpy.dot(UsersPreferences[user,:], MoviesFeatures[:,movie]) for feature in xrange(NumberOfLatentFeatures): UsersPreferences[user][feature] = UsersPreferences[user][feature] + \ LearningRate * (2 * estimatedUserMovieFactors * MoviesFeatures[feature][movie] - RegularizationConstant * UsersPreferences[user][feature]) MoviesFeatures[feature][movie] = MoviesFeatures[feature][movie] + \ LearningRate * (2 * estimatedUserMovieFactors * UsersPreferences[user][feature] - RegularizationConstant * MoviesFeatures[feature][movie]) # if approximation is good enough, stop iterating ApproximationError = calculate_mean_squared_error_of_estimate(MatrixToFactorise, UsersPreferences, MoviesFeatures, NumberOfLatentFeatures, RegularizationConstant) if ApproximationError < 0.001: break
Sorry about
the slide
Page 16
[ 1.52 -0.07 0.66 0.76 0.79] [ 0.79 0.63 0.08 0.9 1.46] [ 0.56 0.58 0.16 0.43 1.28] [-0.15 0.7 0.87 1.45 -0.3]
[ 0.38 0.91 0.32 0.36 1.22] [ 0.72 0.98 0.98 1.28 1.75] [ 1.54 -0.19 0.81 0.61 0.72] [ 0.22 0.61 0.95 1.18 -0.09] [-0.13 0.76 0.97 1.04 -0.26]
Page 17
[ 2.05 3.97 3.96 2.12 1.01] [ 2.93 5.02 3.21 1.61 0.98] [ 2.15 3.95 2.01 1.05 1.1 ] [ 1. 4.29 1.01 2.96 2.98]
Page 18
[ 2.05 3.97 3.96 2.12 1.01] [ 2.93 5.02 3.21 1.61 0.98] [ 2.15 3.95 2.01 1.05 1.1 ] [ 1. 4.29 1.01 2.96 2.98]
2 4 4 ? 13 5 ? ? 1? 4 2 1 ?1 ? 1 3 3
Page 19
[ 2.05 3.97 3.96 2.12 1.01] [ 2.93 5.02 3.21 1.61 0.98] [ 2.15 3.95 2.01 1.05 1.1 ] [ 1. 4.29 1.01 2.96 2.98]
Page 20
1 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 11
13x40 million ratings
Page 22
12
3
Millions of items
Page 24
12
3
Recommendation Request
Page 25
12
3
Recommendation Request
Page 26
Compass = last video
Page 27
Interface Delegate Router
WorkersWorkers
Workers
merge / sort
start
request
routecompute
reply top N
top N to jsonstart
Page 29
Results- ~600 requests per second- latency below 30 ms- quality is ok
Page 30
Results: Throughput
Page 31
Results: Throughput
huh?
Page 32
Interface Delegate Router
WorkersWorkers
Workers
merge / sort
start
request
routecompute
reply top N
top N to jsonstart
Page 33
Results: Quality
Queries Non-zero MAP
1 41 23%
2 87 25%
3 116 36%
4 165 58%
5 196 74%
Page 34
Summary- clustering is data- balanced clusters needed- scale is ok
Page 36
Photos and imagery used in the presentation (except graphs and logos). Amazon recommendations: http://pleated-jeans.com/2010/08/06/amazon-recommendations-for-characters-from-the-office/Pile of books: http://www.paper-pills.com/category/gewgaws/page/2/Function: http://en.wikipedia.org/wiki/File:Graph_of_example_function.svgServer: http://arstechnica.com/gadgets/2007/08/windows-home-server-system-specs-prices-and-launch-date-leaked/Tick: http://ia.wikipedia.org/wiki/File:Tick_green_modern.svg Phone: http://www.foxbusiness.com/technology/2012/05/22/are-carrier-subsidies-hurting-innovation-and-driving-up-mobile-phone-costs/Man in front of computer: http://honesttogawd.blogspot.com.es/